0% found this document useful (0 votes)

207 views92 pages

Discontinuous Petrov-Galerkin Method

The document discusses the discontinuous Petrov–Galerkin (DPG) method, a finite element approach designed for stability through locally computable test functions. It reviews the theoretical foundations, practical applications, and ongoing research related to DPG, emphasizing its ability to inherit stability from variational formulations and its suitability for adaptive algorithms. The paper also addresses challenges in ensuring stability for non-coercive Petrov–Galerkin methods and presents techniques for constructing optimal test spaces and error control.

Uploaded by

Lando Mentrasti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

207 views92 pages

Discontinuous Petrov-Galerkin Method

Uploaded by

Lando Mentrasti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

Acta Numerica (2025), pp.

293–384
doi:10.1017/S0962492924000102

The discontinuous Petrov–Galerkin method

Leszek Demkowicz
Oden Institute, The University of Texas at Austin,
Austin, TX 78712-1229, USA
E-mail: [email protected]

Jay Gopalakrishnan
PO Box 751 (MTH), Portland State University,
Portland, OR 97207-0751, USA
E-mail: [email protected]

The discontinuous Petrov–Galerkin (DPG) method is a Petrov–Galerkin finite ele-

ment method with test functions designed for obtaining stability. These test functions
are computable locally, element by element, and are motivated by optimal test func-
tions which attain the supremum in an inf-sup condition. A profound consequence
of the use of nearly optimal test functions is that the DPG method can inherit the sta-
bility of the (undiscretized) variational formulation, be it coercive or not. This paper
combines a presentation of the fundamentals of the DPG ideas with a review of the
ongoing research on theory and applications of the DPG methodology. The scope of
the presented theory is restricted to linear problems on Hilbert spaces, but pointers to
extensions are provided. Multiple viewpoints to the basic theory are provided. They
show that the DPG method is equivalent to a method which minimizes a residual
in a dual norm, as well as to a mixed method where one solution component is an
approximate error representation function. Being a residual minimization method,
the DPG method yields Hermitian positive definite stiffness matrix systems even for
non-self-adjoint boundary value problems. Having a built-in error representation, the
method has the out-of-the-box feature that it can immediately be used in automatic
adaptive algorithms. Contrary to standard Galerkin methods, which are uninformed
about test and trial norms, the DPG method must be equipped with a concrete test
norm which enters the computations. Of particular interest are variational formu-
lations in which one can tailor the norm to obtain robust stability. Key techniques
to rigorously prove convergence of DPG schemes, including construction of Fortin
operators, which in the DPG case can be done element by element, are discussed in
detail. Pointers to open frontiers are presented.
2020 Mathematics Subject Classification: Primary 65M60, 65N12
Secondary 35F45

© The Author(s), 2025. Published by Cambridge University Press.

This is an Open Access article, distributed under the terms of the Creative Commons Attribution
licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution,
and reproduction in any medium, provided the original work is properly cited.

Downloaded from https://www.cambridge.org/core. IP address: 84.33.184.126, on 22 Jul 2025 at 09:22:39, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0962492924000102
294 L. Demkowicz and J. Gopalakrishnan

CONTENTS
1 Introduction 294
2 Optimal test spaces 297
3 Minimization and other viewpoints 300
4 Ideal DPG methods 305
5 Practical DPG methods 316
6 A posteriori error control 334
7 Ultraweak formulations 338
8 Optimal test functions in time integrators 356
9 Duality in DPG formulations 364
10 Pointers to DPG techniques for nonlinear problems 371
11 Further pointers and conclusion 376
References 378

1. Introduction
In variational methods, approximate solutions are sought in ‘trial’ spaces, while
equations are enforced using ‘test’ spaces. Methods with different trial and test
spaces are referred to as Petrov–Galerkin (PG) formulations. A classical result
on such methods, restated below in Theorem 1.1, provides the following useful
insight for designing Petrov–Galerkin methods: while one must choose trial spaces
with good approximation properties, test spaces may be chosen solely for stabil-
ity. Leveraging this insight, discontinuous Petrov–Galerkin (DPG) methods were
originally conceived (Demkowicz and Gopalakrishnan 2010, 2011b) as Petrov–
Galerkin methods that obtain stability automatically by local test space design
using discontinuous functions. The goal of this review is to provide an introduc-
tion to these methods and present selected recent advances. We review established
DPG techniques, give a few new avenues to existing results, and also present a few
new results. We describe the mathematical foundations for the popular features that
make the DPG method a powerful tool for solving boundary value problems, in-
cluding the ease with which automatic adaptivity can be enabled and stable solvers
for complex problems can be built.
Let us begin by describing a standard difficulty in PG formulations of boundary
value problems that the DPG method addresses. The ‘wellposedness’ (or continu-
ous dependence of solutions on data) of PG formulations need not automatically
yield stability of their discretizations, unlike coercive Galerkin formulations. In
simpler formulations with equal trial and test spaces, once wellposedness of the
variational formulation (usually set in infinite-dimensional Sobolev spaces) is es-
tablished through coercivity, stability of the computable Galerkin discretization
(using finite-dimensional subspaces) follows. The situation for non-coercive PG
methods is more complicated.

To make it precise, we introduce the following setting we shall use throughout

this review (and a few departures from it will be announced with ample warning).
Let 𝑋 and 𝑌 denote two Hilbert spaces over the complex field C. Let 𝑌 ∗ denote
the space of continuous antilinear (or conjugate-linear) functionals on 𝑌 and let
𝑏(·, ·) : 𝑋 × 𝑌 → C denote a continuous sesquilinear form. A wellposedness
formulation is such that

for any ℓ ∈ 𝑌 ∗ , there is a unique 𝑥 ∈ 𝑋 satisfying

𝑏(𝑥, 𝑦) = ℓ(𝑦) for all 𝑦 ∈ 𝑌 . (1.1)

By the well-known theory of mixed systems (Babuška 1971, Brezzi 1974, Ern and
Guermond 2021, Nečas 1962), we know that (1.1) holds if and only if there is a
𝛾 > 0 such that
|𝑏(𝑧, 𝑦)|
inf sup ≥ 𝛾, and (1.2a)
0≠𝑧 ∈𝑋 0≠𝑦 ∈𝑌 ∥𝑧∥ 𝑋 ∥𝑦∥𝑌
{𝑦 ∈ 𝑌 : 𝑏(𝑧, 𝑦) = 0 for all 𝑧 ∈ 𝑋 } = {0}, (1.2b)

or equivalently
|𝑏(𝑧, 𝑦)|
inf sup ≥ 𝛾, and (1.3a)
0≠𝑦 ∈𝑌 0≠𝑧 ∈𝑋 ∥𝑦∥𝑌 ∥𝑧∥ 𝑋
{𝑧 ∈ 𝑋 : 𝑏(𝑧, 𝑦) = 0 for all 𝑦 ∈ 𝑌 } = {0}. (1.3b)

For a computationally realizable Petrov–Galerkin method, one uses finite-dimen-

sional subspaces 𝑋ℎ ⊂ 𝑋 and 𝑌ℎ ⊂ 𝑌 of equal dimension, dim(𝑋ℎ ) = dim(𝑌ℎ ).
Here ℎ is a subscript related to the finite dimension. Letting
|𝑏(𝑥, 𝑦)|
∥𝑏∥ = sup sup ,
0≠𝑥 ∈𝑋 0≠𝑦 ∈𝑌 ∥𝑥∥ 𝑋 ∥𝑦∥𝑌

a classical result of Babuška (see Babuška 1971, Babuška, Aziz, Fix and Kellogg
1972 or Xu and Zikatanov 2003) can be stated as follows.

Theorem 1.1. In the above setting of Hilbert spaces 𝑋, 𝑌 and finite-dimensional

subspaces 𝑋ℎ ⊂ 𝑋, 𝑌ℎ ⊂ 𝑌 satisfying dim(𝑋ℎ ) = dim(𝑌ℎ ), suppose (1.1), (1.2) or
(1.3) hold. If, in addition, there exists a 𝛾 ℎ > 0 such that
|𝑏(𝑧 ℎ , 𝑦 ℎ )|
inf sup ≥ 𝛾ℎ , (1.4)
0≠𝑧ℎ ∈𝑋ℎ 0≠𝑦ℎ ∈𝑌ℎ ∥𝑦 ℎ ∥𝑌

then there is a unique 𝑥 ℎ ∈ 𝑋ℎ satisfying

𝑏(𝑥 ℎ , 𝑦 ℎ ) = ℓ(𝑦 ℎ ) for all 𝑦 ℎ ∈ 𝑌ℎ (1.5)

and
∥𝑏∥
inf ∥𝑥 − 𝑧 ℎ ∥ 𝑋 .
∥𝑥 − 𝑥 ℎ ∥ 𝑋 ≤ (1.6)
𝛾 ℎ 𝑧ℎ ∈𝑋ℎ
The theorem clarifies the above-mentioned difficulty in inheriting discrete sta-
bility from the wellposedness of the variational problem. Specifically, the standard
difficulty in the analysis of Petrov–Galerkin discretizations of the form (1.5) is that
the inf-sup condition (1.2a) does not generally imply the discrete inf-sup condition
(1.4). Hence, unlike coercive forms 𝑏(·, ·), it is easy to obtain unstable PG meth-
ods even when the variational equation (1.1) is wellposed. A second important
observation is that in Theorem 1.1, the test space 𝑌ℎ is absent from the error es-
timate (1.6). It only appears in the inf-sup condition (1.4), which is responsible for
stability. The approximation rates are determined by the trial space 𝑋ℎ in (1.6).
Letting the trial space carry the burden of achieving good approximation properties
liberates the test space from it. The takeaway is that we can focus solely on stability
when designing test spaces. Hence techniques to design discrete subspaces 𝑌ℎ that
guarantee the discrete inf-sup condition (1.4), with 𝛾 ℎ independent of the finite
dimension, are useful.
The next section (Section 2) provides such a technique through the concept
of optimal test functions which attain the supremum in an inf-sup condition. In
Section 3 we shall see that DPG methods can equivalently be thought of as methods
that minimize a residual in a non-standard norm, as well as a non-standard mixed
method with an approximate error representation as a discrete solution component.
One of the key steps that enable local computation of the optimal test functions
is a reformulation of the boundary value problem using a test space of functions
which have no continuity constraints across elements. Such spaces are often
referred to as ‘broken’ spaces and the process is akin to ‘hybridization’. We will
see these terms again when the test space localization technique is introduced
precisely in Section 4. Next, we address the usual practice of computing the
optimal test functions inexactly. The loss of optimality in the stability constant and
techniques to regain discrete stability despite the inexact computation are topics
covered in Section 5. The key ingredient there is a local Fortin operator. The built-
in error estimator contained in all DPG methods is then described in Section 6.
The wide scope of applicability of DPG methods becomes clearer in Section 7,
where we show how to accomplish the above-mentioned localization through a
reformulation in broken graph spaces of very general partial differential equations
(PDE). Application of optimal test functions to create enhanced time integrators is
the subject of Section 8. Duality arguments, certain drawbacks in applying them
to DPG methods, a dual DPG* method, and application to estimating error in goal
functionals are briefly discussed in Section 9. Section 10 contains a collection of
remarks on ongoing efforts to incorporate DPG techniques into nonlinear boundary
value problems, a research area where the last word seems yet to be written. We
conclude in Section 11 with pointers to further works whose details could not be
included in this paper.

2. Optimal test spaces

Is it possible to find a test space 𝑌ℎ for which the exact inf-sup condition (1.2a)
implies the discrete inf-sup condition (1.4)? We begin with a simple and affirmative
answer in Proposition 2.2 below. This then gives rise to an ‘ideal Petrov–Galerkin
method’, a precursor to the DPG method.
Given any trial space 𝑋ℎ , we define its optimal test space for the continuous
sesquilinear form 𝑏(·, ·) : 𝑋 × 𝑌 → C by
opt
𝑌ℎ = 𝑇(𝑋ℎ ), (2.1)
where 𝑇 : 𝑋 → 𝑌 , a ‘trial-to-test operator’, is defined by
(𝑇 𝑧, 𝑦)𝑌 = 𝑏(𝑧, 𝑦) for all 𝑦 ∈ 𝑌 , 𝑧 ∈ 𝑋. (2.2)
Equation (2.2) uniquely defines a 𝑇 𝑧 for any given 𝑧 ∈ 𝑋 by the Riesz representation
theorem. We shall call 𝑇 𝑧 an optimal test function because it solves the optimization
problem stated next.
Proposition 2.1 (Optimizer). For any 𝑧 ∈ 𝑋, the maximum of
𝑓 𝑧 (𝑦) = |𝑏(𝑧, 𝑦)|/∥𝑦∥𝑌
over all non-zero 𝑦 ∈ 𝑌 is attained at 𝑦 = 𝑇 𝑧.
Proof. Rewriting 𝑓 𝑧 using (2.2), duality in Hilbert spaces implies
|(𝑇 𝑧, 𝑦)𝑌 |
sup 𝑓 𝑧 (𝑦) = sup = ∥𝑇 𝑧∥𝑌 .
0≠𝑦 ∈𝑌 0≠𝑦 ∈𝑌 ∥𝑦∥𝑌
Moreover, 𝑓 𝑧 (𝑇 𝑧) = ∥𝑇 𝑧∥𝑌 .
Proposition 2.2 (Exact inf-sup condition =⇒ Discrete inf-sup condition). If the
inf-sup condition (1.2a) holds with some 𝛾 > 0, then the discrete inf-sup condition
opt
(1.4) holds with some 𝛾 ℎ ≥ 𝛾 > 0 when we set 𝑌ℎ = 𝑌ℎ .
Proof. For any 𝑧 ℎ ∈ 𝑋ℎ , letting
|𝑏(𝑧 ℎ , 𝑦)| |𝑏(𝑧 ℎ , 𝑦 ℎ )|
𝑠1 = sup , 𝑠2 = sup ,
0≠𝑦 ∈𝑌 ∥𝑦∥𝑌 0≠𝑦ℎ ∈𝑌ℎ
opt ∥𝑦 ℎ ∥𝑌

it is obvious that 𝑠1 ≥ 𝑠2 . To prove that 𝑠1 ≤ 𝑠2 , since 𝑠1 = ∥𝑇 𝑧 ℎ ∥𝑌 by Proposi-

so 𝑠1 = 𝑠2 . Hence the discrete inf-sup condition (1.4) holds with 𝛾 ℎ ≥ 𝛾.

Definition 2.3. For any trial subspace 𝑋ℎ ⊂ 𝑋, the ideal Petrov Galerkin (IPG)
method finds 𝑥 ℎ ∈ 𝑋ℎ solving
opt
𝑏(𝑥 ℎ , 𝑦 ℎ ) = ℓ(𝑦 ℎ ) for all 𝑦 ℎ ∈ 𝑌ℎ , (2.3)

opt
where 𝑌ℎ ⊂ 𝑌 is computed using the 𝑌 -inner product by (2.1)–(2.2).

Theorem 2.4 (Quasioptimality). If (1.3) holds, then the IPG method (2.3) is
uniquely solvable for 𝑥 ℎ and
∥𝑏∥
∥𝑥 − 𝑥 ℎ ∥ 𝑋 ≤ inf ∥𝑥 − 𝑧 ℎ ∥ 𝑋 , (2.4)
𝛾 𝑧ℎ ∈𝑋ℎ
where 𝑥 is the unique exact solution of (1.1).

Proof. We apply Theorem 1.1. To verify that 𝑇 is injective, suppose 𝑇 𝑧 = 0.

Then, by (2.2), we have 𝑏(𝑧, 𝑦) = 0 for all 𝑦 ∈ 𝑌 , so (1.3b) implies that 𝑧 = 0. Thus
opt
dim(𝑋ℎ ) = dim(𝑌ℎ ).
Next, since (1.3) implies (1.2), the other inf-sup condition,
|𝑏(𝑧, 𝑦)|
𝛾∥𝑧∥𝑌 ≤ sup for all 𝑧 ∈ 𝑋, (2.5)
0≠𝑦 ∈𝑌 ∥𝑦∥𝑌
holds with the same constant 𝛾. Hence Proposition 2.2 shows that the discrete
inf-sup condition (1.4) holds with the same constant, so Theorem 1.1 gives the
result.

Example 2.5 (𝐿 2 least-squares). Suppose 𝛺 ⊂ R 𝑁 , 𝑁 ≥ 1, is an open set and

𝐴 : 𝑋 → 𝐿 2 (𝛺)𝑚 is a continuous bijective linear operator (where, as before, 𝑋 is
some Hilbert space). Then, setting 𝑌 = 𝐿 2 (𝛺)𝑚 , the problem of finding a 𝑢 ∈ 𝑋
such that 𝐴𝑢 = 𝑓 , for any given 𝑓 ∈ 𝑌 , can be put into a variational formulation by
setting
𝑏(𝑢, 𝑣) = (𝐴𝑢, 𝑣)𝑌 , ℓ(𝑣) = ( 𝑓 , 𝑣)𝑌 . (2.6)
opt
Then (2.2) implies that 𝑇𝑢 = 𝐴𝑢, so 𝑌ℎ = 𝐴𝑋ℎ . Hence (2.3) reduces to
(𝐴𝑥 ℎ , 𝐴𝑧 ℎ )𝑌 = ( 𝑓 , 𝐴𝑧 ℎ )𝑌 for all 𝑧 ℎ ∈ 𝑋ℎ ,
that is, for this example, the IPG method of (2.3) coincides with the standard 𝐿 2 (𝛺)-
based least-squares method, which has been well studied (Bochev and Gunzburger
2009, Cai, Lazarov, Manteuffel and McCormick 1994). Not all DPG methods are
𝐿 2 least-squares methods. But as we will see in the next section, all DPG methods
minimize a residual in some norm, not necessarily the 𝐿 2 -norm.
Note that Theorem 2.4 gives an error estimate for the 𝐿 2 least-squares method
(2.6), since its assumptions are readily verified. Indeed, by the injectivity of 𝐴, any
𝑧 ∈ 𝑋 satisfying 𝑏(𝑧, 𝑦) = 0 for all 𝑦 ∈ 𝐿 2 (𝛺) yields 𝐴𝑧 = 0, which implies that
𝑧 = 0. Also,
|(𝐴𝑧, 𝑦)𝑌 | |(𝑦, 𝑦)𝑌 |
sup ≥ ≥ 𝛾∥𝑦∥𝑌 ,
0≠𝑧 ∈𝑋 ∥𝑧∥ 𝑋 ∥ 𝐴 −1 𝑦∥ 𝑋

with 𝛾 = ∥ 𝐴 −1 ∥ −1 , so (1.3) holds. Hence the error estimate (2.4) holds.

Example 2.6 (A one-dimensional boundary value problem). Letting 𝛺 = (0, 1),

the unit interval in R, and 𝑓 ∈ 𝐿 2 (𝛺), consider the boundary value problem to find
𝑢(𝑥) satisfying
𝑢 ′ = 𝑓 in 𝛺, (2.7a)
𝑢(0) = 0, (2.7b)
for some given 𝑓 ∈ 𝐿 2 (𝛺). This fits into the framework of Example 2.5 and yields
the variational form (2.6) with 𝐴𝑢 = 𝑑𝑢/𝑑𝑥 ≡ 𝑢 ′ , 𝑋 = {𝑢 ∈ 𝐻 1 (𝛺) : 𝑢(0) = 0},
𝑚 = 1 and 𝑌 = 𝐿 2 (𝛺) (and it is easy to prove that 𝐴 : 𝑋 → 𝑌 is a bijection).
A different variational formulation for (2.7) can be obtained if we integrate by
parts after multiplying (2.7a) by a test function. Then, using (2.7b) and letting the
unknown value 𝑢(1) to be a separate variable 𝑢ˆ 1 , to be determined, we have
∫ 1 ∫ 1
′
− 𝑢𝑣 + 𝑢ˆ 1 𝑣(1) = 𝑓 𝑣.
0 0

Grouping the trial variable into 𝑧 = (𝑢, 𝑢ˆ 1 ), set

∫ 1 ∫ 1
′
𝑏(𝑧, 𝑣) ≡ 𝑏( (𝑢, 𝑢ˆ 1 ), 𝑣) = 𝑢ˆ 1 𝑣(1) − 𝑢𝑣 , ℓ(𝑣) = 𝑓 𝑣. (2.8)
0 0

Set the spaces and their norms by

𝑋 = 𝐿 2 (𝛺) × R, 𝑌 = 𝐻 1 (𝛺),
∥𝑧∥ 2𝑋 ≡ ∥(𝑢, 𝑢ˆ 1 )∥ 2𝑋 = ∥𝑢∥ 2𝐿 2 (𝛺) + | 𝑢ˆ 1 | 2 , ∥𝑣∥𝑌2 = ∥𝑣 ′ ∥ 2𝐿 2 (𝛺) + |𝑣(1)| 2 .

By Sobolev inequality, ∥𝑣∥𝑌 is equivalent to the standard 𝐻 1 (𝛺)-norm. With these

settings, it is easy to prove that (1.3) holds with
𝛾 = ∥𝑏∥ = 1. (2.9)
One can also easily calculate the trial-to-test operator by analytically solving
(2.2) for this example: for any 𝑧 = (𝑢, 𝑢ˆ 1 ) ∈ 𝑋,
∫ 1
𝑇 𝑧 ≡ 𝑇(𝑢, 𝑢ˆ 1 ) = 𝑢ˆ 1 + 𝑢(𝑠) 𝑑𝑠. (2.10)
𝑥

This implies that letting 𝑃 𝑝 (𝛺) denote the space of polynomials of degree at most
𝑝, restricted to 𝛺, and setting the discrete trial space to 𝑋ℎ = 𝑃 𝑝 (𝛺) × R, we have
opt
𝑌ℎ = 𝑃 𝑝+1 (𝛺).
The solution 𝑥 ℎ = (𝑢 ℎ , 𝑢ˆ 1,ℎ ) of the resulting IPG method, in view of Theorem 2.4
and (2.9), is interesting in that 𝑢 ℎ equals the 𝐿 2 (𝛺)-projection of 𝑢 onto 𝑃 𝑝 (𝛺).
In the general case, although one cannot expect the method to deliver the best
𝐿 2 -approximation from the trial space, the solution is the best approximation in
some norm, as will be proved in the next section.

Bibliographical notes. The material of this section is based on Demkowicz and

Gopalakrishnan (2011b). Sequels by Demkowicz, Gopalakrishnan and Niemi
(2012a) and Zitelli et al. (2011) developed the theme further. A prequel by Dem-
kowicz and Gopalakrishnan (2010) focused solely on the transport equation (not
discussed in the review) and used a test space consisting of parts of analytically
solvable optimal test functions.

3. Minimization and other viewpoints

The ideal Petrov–Galerkin method (2.3) admits two equivalent reformulations, one
as a least-squares method that minimizes the residual in a non-standard dual norm,
and another as a mixed Galerkin method (with the same trial and test spaces)
solving an associated min-max for a saddle point.
Let 𝑅𝑌 : 𝑌 → 𝑌 ∗ denote the standard Riesz map defined by (𝑅𝑌 𝑦)(𝑣) = (𝑦, 𝑣)𝑌 ,
for all 𝑦 and 𝑣 in 𝑌 . Here and throughout, the inner product of 𝑌 is denoted by
(·, ·)𝑌 . It is well known to be invertible and isometric:
∥𝑅𝑌 𝑦∥𝑌 ∗ = ∥𝑦∥𝑌 . (3.1)
Let 𝐵 : 𝑋 → 𝑌 ∗ be the operator generated by the form 𝑏(·, ·), i.e. 𝐵𝑥(𝑦) = 𝑏(𝑥, 𝑦)
for all 𝑥 ∈ 𝑋 and 𝑦 ∈ 𝑌 . Since the defining equation of 𝑇, namely (2.2), can be
rewritten using this notation as 𝑅𝑌 𝑇 𝑧 = 𝐵𝑧, we see that
𝑇 = 𝑅𝑌−1 ◦ 𝐵. (3.2)

3.1. Equivalent characterization as a residual minimizer

Definition 3.1. On the trial space, we define an energy norm of 𝑧 ∈ 𝑋 by |||𝑧||| 𝑋 ≔
∥𝑇 𝑧∥𝑌 . Clearly, by Proposition 2.1,
|𝑏(𝑧, 𝑦)|
|||𝑧||| 𝑋 = ∥𝑇 𝑧∥𝑌 = sup .
0≠𝑦 ∈𝑌 ∥𝑦∥𝑌
This is indeed a norm if (1.2a) holds due to easily seen norm equivalence
𝛾∥𝑧∥ 𝑋 ≤ |||𝑧||| 𝑋 ≤ ∥𝑏∥ ∥𝑧∥ 𝑋 for all 𝑧 ∈ 𝑋.
Theorem 3.2 (Residual minimization). Suppose (1.1) holds. Then the follow-
ing are equivalent statements for any given 𝑥 ℎ in 𝑋ℎ .
(a) The 𝑥 ℎ is the unique solution of the IPG method (2.3).
(b) The 𝑥 ℎ is the best approximation to 𝑥 from 𝑋ℎ in the sense that
|||𝑥 − 𝑥 ℎ ||| 𝑋 = inf |||𝑥 − 𝑧 ℎ ||| 𝑋 .
𝑧ℎ ∈𝑋ℎ

(c) The 𝑥 ℎ minimizes the residual in the following sense:

𝑥 ℎ = arg min ∥ℓ − 𝐵𝑧 ℎ ∥𝑌 ∗ .
𝑧ℎ ∈𝑋ℎ

Proof. (a) ⇐⇒ (b) By definition of the IPG method, 𝑥 ℎ solves (2.3) if and only
opt
if 𝑏(𝑥 − 𝑥 ℎ , 𝑦 ℎ ) = 0 for all 𝑦 ℎ ∈ 𝑌ℎ . By the definition of the optimal test space,
this is equivalent to
𝑏(𝑥 − 𝑥 ℎ , 𝑇 𝑧 ℎ ) = 0 for all 𝑧 ℎ ∈ 𝑋ℎ ,
which, in turn, is equivalent to
(𝑇(𝑥 − 𝑥 ℎ ), 𝑇 𝑧 ℎ )𝑌 = 0 for all 𝑧 ℎ ∈ 𝑋ℎ ,
due to (2.2). The result follows since (𝑇 ·, 𝑇 ·)𝑌 is the inner product generating the
|||·||| 𝑋 -norm.
(b) ⇐⇒ (c) In view of (3.2),
|||𝑥 − 𝑧 ℎ ||| 𝑋 = ∥𝑇(𝑥 − 𝑧 ℎ )∥𝑌 = ∥𝑅𝑌−1 𝐵(𝑥 − 𝑧 ℎ )∥𝑌 .
Hence, by the isometry of the Riesz map (3.1), item (b) holds if and only if
∥𝐵(𝑥 − 𝑥 ℎ )∥𝑌 ∗ = inf ∥𝐵(𝑥 − 𝑧 ℎ )∥𝑌 ∗ ,
𝑧ℎ ∈𝑋ℎ

which, since ℓ = 𝐵𝑥, is the same as (c).

3.2. Equivalent characterization as a mixed formulation

Definition 3.3. Let 𝑥 be as in (1.1) and let 𝑥 ℎ solve (2.3). Following earlier
terminology, the Riesz representation of the residual, namely 𝜀 = 𝑅𝑌−1 (ℓ − 𝐵𝑥 ℎ ), is
often called the error representation (function). Clearly,
∥𝜀∥𝑌 = ∥𝑅𝑌−1 𝐵(𝑥 − 𝑥 ℎ )∥𝑌 = ∥𝑇(𝑥 − 𝑥 ℎ )∥𝑌 = |||𝑥 − 𝑥 ℎ ||| 𝑋 ,
that is, the 𝑌 -norm of 𝜀 measures the error in the energy norm. Note that 𝜀 is the
unique element of 𝑌 satisfying
(𝜀, 𝑦)𝑌 = ℓ(𝑦) − 𝑏(𝑥 ℎ , 𝑦) for all 𝑦 ∈ 𝑌 . (3.3)
Theorem 3.4 (Mixed Galerkin reformulation). The following are equivalent
statements.
(a) 𝑥 ℎ ∈ 𝑋ℎ solves the IPG method (2.3).
(b) 𝑥 ℎ and 𝜀 solve the mixed formulation
(𝜀, 𝑦)𝑌 + 𝑏(𝑥 ℎ , 𝑦) = ℓ(𝑦) for all 𝑦 ∈ 𝑌 , (3.4a)
𝑏(𝑧 ℎ , 𝜀) = 0 for all 𝑧 ℎ ∈ 𝑋ℎ . (3.4b)
(c) 𝜀 and 𝑥 ℎ form the saddle point of 𝐿(𝑦, 𝑧) = 21 ∥𝑦∥𝑌2 − ℓ(𝑦) + 𝑏(𝑧, 𝑦) on 𝑌 × 𝑋ℎ ,
𝐿(𝜀, 𝑥 ℎ ) = min max 𝐿(𝑦, 𝑧).
𝑦 ∈𝑌 𝑧 ∈𝑋ℎ

Proof. (a) =⇒ (b) Equation (3.4a) is the same as (3.3), so we only need to prove
(3.4b). To this end,
𝑏(𝑧 ℎ , 𝜀) = (𝑇 𝑧 ℎ , 𝜀)𝑌 = (𝑇 𝑧 ℎ , 𝑅𝑌−1 (ℓ − 𝐵𝑥 ℎ ))𝑌 = (𝑇 𝑧 ℎ , 𝑇(𝑥 − 𝑥 ℎ ))𝑌 ,

which, being the conjugate of 𝑏(𝑥 − 𝑥 ℎ , 𝑇 𝑧 ℎ ), vanishes.

opt
(b) =⇒ (a) Since (3.4a) implies 𝑏(𝑥 ℎ , 𝑦 ℎ ) = ℓ(𝑦 ℎ ) − (𝜀, 𝑦 ℎ )𝑌 for all 𝑦 ℎ ∈ 𝑌ℎ , it
opt opt
suffices to prove that (𝜀, 𝑦 ℎ )𝑌 = 0 for all 𝑦 ℎ ∈ 𝑌ℎ . Any 𝑦 ℎ ∈ 𝑌ℎ is of the form
𝑦 ℎ = 𝑇 𝑧 ℎ for some 𝑧 ℎ ∈ 𝑋ℎ , so
(𝜀, 𝑦 ℎ )𝑌 = (𝑇 𝑧 ℎ , 𝜀)𝑌 = 𝑏(𝑧 ℎ , 𝜀) = 0
by (3.4b).
(b) ⇐⇒ (c) This follows from classical results on mixed methods (see e.g.
Brezzi and Fortin 1991, Ch. II) or duality theory (see e.g. Ekeland and Témam
1999, Ch. VI).

3.3. Optimal test norm and another trial-to-test operator

We have seen in Theorem 3.2(b) that the ideal PG method produces the best
approximation in the energy norm |||·||| 𝑋 (defined in Definition 3.1). In practice, one
may want the best approximation in a given trial space norm, say ∥ · ∥ 𝑋 . Is it possible
to engineer a test space norm such that the solution is the best approximation in a
wanted trial space norm? The simple answer in the affirmative is provided by the
optimal test norm, introduced below in the context of a generalized duality pairing.
We write the duality pairing in any Hilbert space 𝑌 as either
𝑓 (𝑦) or ⟨ 𝑓 , 𝑦⟩𝑌 . (3.5a)
Both denote the action of some 𝑓 ∈ 𝑌 ∗ on a 𝑦 ∈ 𝑌 . The duality pairing satisfies
|⟨ 𝑓 , 𝑦⟩𝑌 | |⟨ 𝑓 , 𝑦⟩𝑌 |
∥ 𝑓 ∥𝑌 ∗ = sup and ∥𝑦∥𝑌 = sup . (3.5b)
0≠𝑦 ∈𝑌 ∥𝑦∥𝑌 0≠ 𝑓 ∈𝑌 ∗ ∥ 𝑓 ∥𝑌 ∗
Definition 3.5. Analogous to the energy norm |||·||| 𝑋 in Definition 3.1, we define
the optimal test norm ||||𝑦||||𝑌 of any 𝑦 in the test space 𝑌 by
|𝑏(𝑧, 𝑦)|
||||𝑦||||𝑌 = sup . (3.6)
0≠𝑧 ∈𝑋 ∥𝑧∥ 𝑋
This is obviously a norm when (1.3) holds. We shall refer to a generic sesquilinear
form 𝑏(·, ·) : 𝑋 × 𝑌 → C as a generalized duality pairing if
|||𝑧||| 𝑋 = ∥𝑧∥ 𝑋 and ||||𝑦||||𝑌 = ∥𝑦∥𝑌 (3.7)
hold for all 𝑧 ∈ 𝑋 and 𝑦 ∈ 𝑌 . This terminology is motivated by the standard duality
pairing 𝑏(·, ·) = ⟨·, ·⟩𝑌 in the case 𝑋 = 𝑌 ∗ , where (3.5) implies
|𝑏(𝑧, 𝑦)| |𝑏(𝑧, 𝑦)|
∥𝑧∥ 𝑋 = sup and ∥𝑦∥𝑌 = sup , (3.8)
0≠𝑦 ∈𝑌 ∥𝑦∥𝑌 0≠𝑧 ∈𝑋 ∥𝑧∥ 𝑋
a pair of identities equivalent to (3.7). One of the pair implies the other, as shown
shortly.

Let 𝐵 : 𝑋 → 𝑌 ∗ denote the operator generated by 𝑏(·, ·) by ⟨𝐵𝑧, 𝑦⟩𝑌 = 𝑏(𝑧, 𝑦) for
all 𝑧 ∈ 𝑋, 𝑦 ∈ 𝑌 . Identifying the bidual 𝑌 ∗∗ with 𝑌 , the adjoint 𝐵∗ : 𝑌 → 𝑋 ∗ of 𝐵
satisfies
(𝐵∗ 𝑦)(𝑧) = 𝑏(𝑧, 𝑦) for all 𝑧 ∈ 𝑋, 𝑦 ∈ 𝑌 . (3.9)
Using the Riesz maps in 𝑋 and 𝑌 , we then immediately have
𝑏(𝑧, 𝑦) = 𝑅𝑌−1 𝐵𝑧, 𝑦 𝑌 = 𝑧, 𝑅 𝑋−1 𝐵∗ 𝑦 𝑋 , 𝑧 ∈ 𝑋, 𝑦 ∈ 𝑌 .

(3.10)
This readily implies the twin identities
|||𝑧||| 𝑋 = ∥𝑅𝑌−1 𝐵𝑧∥𝑌 , ||||𝑦||||𝑌 = ∥𝑅 𝑋−1 𝐵∗ 𝑦∥ 𝑋 (3.11)
for all 𝑧 ∈ 𝑋 and 𝑦 ∈ 𝑌 , by the definitions of energy norm and optimal test norm.
Now we show that one may equivalently shorten the definition of the generalized
duality pairing by omitting one of the two equalities in (3.7).

Proposition 3.6. The identity |||𝑧||| 𝑋 = ∥𝑧∥ 𝑋 holds for all 𝑧 ∈ 𝑋 if and only if
||||𝑦||||𝑌 = ∥𝑦∥𝑌 for all 𝑦 ∈ 𝑌 . Therefore, whenever either equality holds, we have
∥𝑏∥ = 1 and 𝛾 = 1.

Proof. If |||𝑧||| 𝑋 = ∥𝑧∥ 𝑋 for all 𝑧 ∈ 𝑋, then (3.6) and (3.11) imply

|𝑏(𝑧, 𝑦)| |(𝑅𝑌−1 𝐵𝑧, 𝑦)𝑌 |

||||𝑦||||𝑌 = sup = sup −1
.
0≠𝑧 ∈𝑋 ∥𝑧∥ 𝑋 0≠𝑧 ∈𝑋 ∥𝑅𝑌 𝐵𝑧∥𝑌

The last supremum equals ∥𝑦∥𝑌 since 𝑅𝑌−1 𝐵 : 𝑋 → 𝑌 is a bijection. The converse
is proved similarly using the other identity in (3.11). The last assertion on ∥𝑏∥ and
𝛾 immediately follows from (3.8).

Clearly one direction of Proposition 3.6 answers the question posed at the begin-
ning of this subsection. If we use the optimal test norm for 𝑌 , then the energy norm
coincides with the given ∥ · ∥ 𝑋 -norm, and Theorem 3.2(b) shows that the solution
of the IPG method is guaranteed to be the best approximation in the given 𝑋-
norm. However, as we shall see later, the optimal test norm is often not practically
computable easily in the multi-dimensional examples we have in mind.
Next, we contrast the previously introduced trial-to-test operator which produces
optimal test functions with an earlier trial-to-test operator given in Barrett and
Morton (1984). To this end, let us introduce an adaptation of their ideas to our
current Petrov–Galerkin setting. (They used equal trial and test spaces.) We define
the ‘Barrett–Morton trial-to-test operator’ 𝑇 BM : 𝑋 → 𝑌 by
𝑏(𝑤, 𝑇 BM 𝑧) = (𝑤, 𝑧)𝑋 for all 𝑤, 𝑧 ∈ 𝑋. (3.12)
Using the inverse of 𝐵∗ , an equivalent characterization of 𝑇 BM is
𝑇 BM = (𝐵∗ )−1 ◦ 𝑅 𝑋 .

Comparing (3.12) with (2.2), we find two different trial-to-test mappings. The
difference between our 𝑇 = 𝑅𝑌−1 ◦ 𝐵 (see (3.2)) and 𝑇 BM is illustrated in the
following diagram:
𝑌∗
𝐵
𝑋
𝑅𝑋 𝑅𝑌

𝑋∗ 𝑌
𝐵∗

which is not commutative in general, and which further clarifies that 𝑇 ≠ 𝑇 BM in

general.
Analogous to the IPG method of Definition 2.3, we can now consider a similar
method using 𝑇 BM in place of 𝑇. Using any given trial subspace 𝑋ℎ ⊂ 𝑋, consider
ℎ ∈ 𝑋 ℎ that solves
finding 𝑥 BM

ℎ , 𝑦 ℎ = ℓ(𝑦 ℎ ) for all 𝑦 ℎ = 𝑇 𝑧 ℎ , 𝑧 ℎ ∈ 𝑋 ℎ .
𝑏 𝑥 BM BM
(3.13)
Subtracting this equation from (1.1) and substituting 𝑤 = 𝑥−𝑥 ℎ and 𝑧 = 𝑧 ℎ in (3.12),
we learn that

0 = 𝑏 𝑥 − 𝑥 BM
ℎ , 𝑇 𝑧 ℎ = 𝑥 − 𝑥 ℎ , 𝑤 ℎ 𝑋 for all 𝑤 ℎ ∈ 𝑋ℎ .
BM BM

This implies the remarkable property that the solution 𝑥 BM ℎ ∈ 𝑋ℎ of the method
(3.13) equals the 𝑋-orthogonal projection of the exact solution 𝑥 and explains the
potential interest in the method (3.13). However, inverting 𝐵∗ to compute test
space basis functions is generally too expensive. In contrast, we will show in later
sections that the inversion of 𝑅𝑌 to compute 𝑇 can be realized locally if the problem
is reformulated adequately.
Nevertheless, at this point it is useful to note one scenario where 𝑇 and 𝑇 BM
coincide. This occurs when 𝑏 is a generalized duality pairing.
Proposition 3.7. If ∥𝑧∥ 𝑋 = |||𝑧||| 𝑋 for all 𝑧 ∈ 𝑋, then 𝑇 = 𝑇 BM .
Proof. By (3.11), |||𝑧||| 𝑋 = ∥𝑅𝑌−1 𝐵𝑧∥𝑌 = ∥𝑇 𝑧∥𝑌 . Hence, whenever ∥𝑧∥ 𝑋 = |||𝑧||| 𝑋
for all 𝑧 ∈ 𝑋, by polarization, we have
(𝑤, 𝑧)𝑋 = (𝑇 𝑤, 𝑇 𝑧)𝑌 = 𝑏(𝑤, 𝑇 𝑧) for all 𝑧, 𝑤 ∈ 𝑋,
where we have used (2.2) in the last equality. Comparing this with (3.12), we find
that 𝑏(𝑤, 𝑇 𝑧) = 𝑏(𝑤, 𝑇 BM 𝑧) for all 𝑤, 𝑧 ∈ 𝑋. Hence 𝑇 = 𝑇 BM by (1.2b).
Thus, when 𝑏 is a generalized duality pairing, the IPG method coincides with
the method (3.13) and the discrete solution equals the 𝑋-orthogonal projection of
the exact solution.
Example 3.8. It can be easily seen that the bilinear form 𝑏 in (2.8) of Example 2.6
is a generalized duality pairing. Hence the analytically solved expression for 𝑇,
given there in (2.10), coincides with 𝑇 BM .

Bibliographical notes. The interpretation of the IPG method as a residual min-

imization method was pointed out in Demkowicz and Gopalakrishnan (2011b,
eq. (2.13)). The minimization of residual in dual norms was also the theme in
many previous works such as Bramble, Lazarov and Pasciak (1997) and Bramble
and Pasciak (2004), where the dual norm was replaced by a preconditioner action.
Where the DPG methods depart from these works, as will be clear from the next
section, is in the localization of the dual-norm computation through hybridization.
The interpretation of the DPG method as a mixed Galerkin method has parallels
in Cohen, Dahmen and Welper (2012). Theorems 3.2 and 3.4 can be seen in
Gopalakrishnan (2013), Bouma, Gopalakrishnan and Harb (2014) and Demkowicz
and Gopalakrishnan (2017). More recently, a substantial generalization of such
theorems to a Banach space setting was achieved by Muga and van der Zee (2020).
Optimal test norms were introduced in Zitelli et al. (2011). Generalized duality
pairings and non-trivial examples of them in the context of certain trace spaces can
be found in Demkowicz (2018). Proposition 3.7 connects our optimal test function
idea to the old concepts of Barrett and Morton (1984). There is a considerable lit-
erature in pursuit of making their idea more computationally feasible, e.g. Barbone
and Harari (2001), Celia, Russell, Ismael and Ewing (1990), Demkowicz and Oden
(1986b,a), Loula, Hughes and Franca (1987) and Loula and Fernandes (2009). We
instead switch course in the next section to pursue localization of the computation
of our trial-to-test operator 𝑇.

4. Ideal DPG methods

In this and the next section, we define DPG methods. Throughout, the boundary
value problems considered are posed on an open bounded domain 𝛺 ⊂ R 𝑁 with
Lipschitz boundary. We further assume that 𝛺 is partitioned into disjoint open
subsets 𝐾 (called elements), forming the collection 𝛺 ℎ (called mesh), such that the
union of 𝐾¯ for all 𝐾 ∈ 𝛺 ℎ is 𝛺.
¯ We assume that the element boundaries 𝜕𝐾 are
Lipschitz so we can apply trace theorems on them in specific applications. The
shape of the elements is unimportant in this section. Let 𝑌 (𝐾) denote a Hilbert
space of some space of functions on an element 𝐾, with inner product (·, ·)𝑌 (𝐾) .
Definition 4.1. An ideal DPG method is an IPG method (as in Definition 2.3)
where 𝑌 is set to the Cartesian product of Hilbert spaces 𝑌 (𝐾), that is,
Ö
𝑌= 𝑌 (𝐾), (4.1)
𝐾 ∈𝛺ℎ

endowed with the inner product

∑︁
(𝑦, 𝑣)𝑌 = (𝑦 𝐾 , 𝑣 𝐾 )𝑌 (𝐾) for all 𝑦, 𝑣 ∈ 𝐾, (4.2)
𝐾 ∈𝛺ℎ

where 𝑦 𝐾 denotes the 𝑌 (𝐾)-component of any 𝑦 in the Cartesian product (4.1).

Our interest in using such a product space for the test variable is the resulting
localization of the trial-to-test operator 𝑇. Note that to compute a basis for the
optimal test space, we must solve (2.2) to compute 𝑇 𝑧 for each 𝑧 in a basis of 𝑋ℎ .
That equation, (𝑇 𝑧, 𝑦)𝑌 = 𝑏(𝑧, 𝑦), decouples into independent equations on each
element, if 𝑌 has the form (4.1). Localization of 𝑇 refers to the fact that the part
of 𝑇 𝑧 on an element 𝐾, namely (𝑇 𝑧)𝐾 , can be computed, independently of other
elements, by solving
((𝑇 𝑧)𝐾 , 𝑦 𝐾 )𝑌 (𝐾) = 𝑏(𝑧, 𝑦 𝐾 ) for all 𝑦 𝐾 ∈ 𝑌 (𝐾). (4.3)
The adjective discontinuous in the name ‘DPG’ refers to the fact that test functions
in 𝑌 of the form (4.1) admit discontinuous functions with no continuity constraints
across element interfaces. For example, in many applications, we set 𝑌 to
𝐻 1 (𝛺 ℎ ) ≔ {𝑣 ∈ 𝐿 2 (𝛺) : 𝑣| 𝐾 ∈ 𝐻 1 (𝐾) for all 𝐾 ∈ 𝛺 ℎ },
which can be identified with the Cartesian product
Ö
𝐻 1 (𝛺 ℎ ) ≡ 𝐻 1 (𝐾), (4.4)
𝐾 ∈𝛺ℎ

and contains functions that are discontinuous across element interfaces. Collo-
quially, we say that 𝐻 1 (𝛺 ℎ ) is a broken Sobolev space, obtained by breaking the
inter-element continuity constraints of 𝐻 1 (𝛺). DPG methods are built using broken
Sobolev spaces as test spaces.
Example 4.2 (Laplace equation: primal DPG formulation). Let 𝑓 ∈ 𝐿 2 (𝛺) and
𝑢 satisfy
−Δ𝑢 = 𝑓 in 𝛺, (4.5a)
𝑢=0 on 𝜕𝛺. (4.5b)
The standard variational formulation for this problem finds 𝑢 in 𝐻˚ 1 (𝛺) such that
(grad 𝑢, grad 𝑣)𝛺 = ( 𝑓 , 𝑣)𝛺 for all 𝑣 ∈ 𝐻˚ 1 (𝛺). (4.6)
A different variational formulation is obtained if we multiply (4.5a) by a possibly
discontinuous test function 𝑦 ∈ 𝐻 1 (𝛺 ℎ ) (defined in (4.4)) and integrate by parts,
element by element. On a single element 𝐾 ∈ 𝛺 ℎ , we have
∫ ∫ ∫
grad 𝑢 · grad 𝑦 − (𝑛 · grad 𝑢)𝑦 = 𝑓 𝑦. (4.7)
𝐾 𝜕𝐾 𝐾

The integral over 𝜕𝐾 must be interpreted as a duality pairing in 𝐻 1/2 (𝜕𝐾) if 𝑢

is not sufficiently regular. Recalling our notation for duality pairing in (3.5) and
letting 𝑛 · grad 𝑢 be an independent unknown denoted by 𝑞ˆ 𝑛 , we now derive a
Petrov–Galerkin formulation. To state it precisely, we use the following notation:
∑︁ ∑︁
(𝑟, 𝑠)ℎ = (𝑟, 𝑠)𝐾 , ⟨ℓ, 𝑤⟩ℎ = ⟨ℓ, 𝑤⟩ 𝐻 1/2 (𝜕𝐾) , (4.8)
𝐾 ∈𝛺ℎ 𝐾 ∈𝛺ℎ

where (·, ·)𝐷 , for any domain 𝐷, denotes the 𝐿 2 (𝐷)-inner product and ⟨ℓ, ·⟩ 𝐻 1/2 (𝜕𝐾)
denotes the action of a conjugate linear functional ℓ ∈ 𝐻 −1/2 (𝜕𝐾) on a function in
𝐻 1/2 (𝜕𝐾). Define the element-by-element trace operator
Ö
tr𝑛 : 𝐻(div, 𝛺) → 𝐻 −1/2 (𝜕𝐾), tr𝑛 𝑟 | 𝜕𝐾 = 𝑟 · 𝑛| 𝜕𝐾 . (4.9)
𝐾 ∈𝛺ℎ

Here and throughout, 𝑛 denotes the unit outward normal vector of a domain under
consideration, which is usually clear from the context, e.g. above 𝑛 is the outward
unit normal on each element boundary 𝜕𝐾. (On an interior interface shared by two
elements 𝐾± , the 𝑛 from 𝐾± will have opposite signs.) We endow the image of the
trace map with a quotient norm,
𝐻 −1/2 (𝜕𝛺 ℎ ) = range(tr𝑛 ),
∥ 𝑟ˆ𝑛 ∥ 𝐻 −1/2 (𝜕𝛺ℎ ) = inf ∥𝑞∥ 𝐻(div,𝛺) , (4.10)
𝑞 ∈tr𝑛−1 { 𝑟ˆ𝑛 }

where the infimum is over the preimage

tr𝑛−1 {𝑟ˆ𝑛 } = {𝑞 ∈ 𝐻(div, 𝛺) : tr𝑛 (𝑞) = 𝑟ˆ𝑛 }.
Since the element boundary traces of 𝑛 · grad 𝑢 appearing in (4.7) are in
𝐻 −1/2 (𝜕𝛺 ℎ ), we now have a trial space to place the interface variables. Given
a 𝑟ˆ𝑛 ∈ 𝐻 −1/2 (𝜕𝛺 ℎ ), note that for any 𝑣 ∈ 𝐻 1 (𝛺 ℎ ),
⟨𝑟ˆ𝑛 , 𝑣⟩ℎ = ⟨𝑛 · 𝑟, 𝑣⟩ℎ
for all 𝑟 ∈ 𝐻(div, 𝛺) with tr𝑛 (𝑟) = 𝑟ˆ𝑛 . The interior values of 𝑟 are not seen by
the right-hand side. When 𝑟 · 𝑛 is sufficiently smooth on each element interface,
one can give an intrinsic characterization by orienting each interface; see (5.16) of
Example 5.5.
With this notation, we can now give the Petrov–Galerkin formulation obtained
by summing up (4.7) over all 𝐾 ∈ 𝛺 ℎ . Set
𝑋 = 𝐻˚ 1 (𝛺) × 𝐻 −1/2 (𝜕𝛺 ℎ ), 𝑌 = 𝐻 1 (𝛺 ℎ ).
Then the PG formulation finds (𝑢, 𝑞ˆ 𝑛 ) ∈ 𝑋 satisfying
(grad 𝑢, grad 𝑣)ℎ − ⟨𝑞ˆ 𝑛 , 𝑣⟩ℎ = ( 𝑓 , 𝑣)𝛺 for all 𝑣 ∈ 𝑌 . (4.11)
This is a ‘hybrid’ form of the standard formulation (4.6). Although it is different
from the primal hybrid formulation of Raviart and Thomas (1977b), there are a
number of common features, including the use of the quotient norm of the type
(4.10). Since the test space 𝑌 = 𝐻 1 (𝛺 ℎ ) is a product space as in Definition 4.1, this
formulation, provided we verify its wellposedness (which is done below), admits
the construction of an ideal DPG method with localized optimal test space, known
as the primal DPG method for the Laplace equation.
Processes that arrive at reformulations of a problem using spaces of discontinu-
ous functions and new interface variables have traditionally been referred to as

hybridization; see e.g. Raviart and Thomas (1977b), Brezzi and Fortin (1991) or
Cockburn and Gopalakrishnan (2004). We have used the adjective ‘hybrid’ in the
above example following this tradition. To analyse hybrid formulations like those
ˆ and 𝑌 denote
of Example 4.2, we formulate a result in a general setting. Let 𝑋0 , 𝑋,
Hilbert spaces over C, put 𝑋 = 𝑋0 × 𝑋, and let 𝑏 0 : 𝑋0 ×𝑌 → C and 𝑏ˆ : 𝑋ˆ ×𝑌 → C
ˆ
denote continuous sesquilinear forms. Then
𝑌0 = {𝑦 ∈ 𝑌 : 𝑏( ˆ 𝑦) = 0 for all 𝑥ˆ ∈ 𝑋ˆ }
ˆ 𝑥, (4.12a)
is a closed subspace of 𝑌 . Suppose there are positive constants 𝛾0 and 𝛾ˆ such that
|𝑏 0 (𝑥, 𝑦)|
𝛾0 ∥𝑥∥ 𝑋0 ≤ sup for all 𝑥 ∈ 𝑋0 , and (4.12b)
0≠𝑦 ∈𝑌0 ∥𝑦∥𝑌
| 𝑏(
ˆ 𝑥,
ˆ 𝑦)|
𝛾ˆ ∥ 𝑥∥
ˆ 𝑋ˆ ≤ sup for all 𝑥ˆ ∈ 𝑋.
ˆ (4.12c)
0≠𝑦 ∈𝑌 ∥𝑦∥ 𝑌

Our abstraction of a hybrid formulation is based on the continuous sesquilinear

form
ˆ 𝑦) = 𝑏 0 (𝑥, 𝑦) + 𝑏(
𝑏( (𝑥, 𝑥), ˆ 𝑥,
ˆ 𝑦),
over 𝑋 = 𝑋0 × 𝑋ˆ and 𝑌 . In examples, 𝑋ˆ will be a space of interface variables (on
element boundaries) and 𝑌 will be a space admitting functions with no continuity
constraints across element boundaries. Given an ℓ ∈ 𝑌 ∗ , we are interested in the
wellposedness of the hybrid problem to find 𝑥 ∈ 𝑋0 and 𝑥ˆ ∈ 𝑋ˆ satisfying
ˆ 𝑦) = ℓ(𝑦)
𝑏( (𝑥, 𝑥), for all 𝑦 ∈ 𝑌 , (4.13)
in relation to the problem of finding 𝑥 ∈ 𝑋0 satisfying
𝑏 0 (𝑥, 𝑦) = ℓ(𝑦) for all 𝑦 ∈ 𝑌0 . (4.14)
Theorem 4.3 (Wellposedness of hybrid Petrov–Galerkin formulations). In the
setting of (4.12), we have
|𝑏( (𝑥, 𝑥),
ˆ 𝑦)|
𝛾1 ∥(𝑥, 𝑥)∥
ˆ 𝑋 ≤ sup , (4.15)
0≠𝑦 ∈𝑌 ∥𝑦∥𝑌
where 𝛾1 is given by
2
1 1 1 ∥𝑏 0 ∥
2
= 2+ 2 +1 .
𝛾1 𝛾0 𝛾ˆ 𝛾0
Moreover,
𝑍 = {𝑦 ∈ 𝑌 : 𝑏( (𝑥, 𝑥),
ˆ 𝑦) = 0 for all 𝑥 ∈ 𝑋0 and 𝑥ˆ in 𝑋ˆ } and
𝑍0 = {𝑦 ∈ 𝑌0 : 𝑏 0 (𝑥, 𝑦) = 0 for all 𝑥 ∈ 𝑋0 }
are equal:
𝑍 = 𝑍0 . (4.16)

Consequently, if 𝑍0 = {0}, then (4.13) is uniquely solvable, and furthermore, the

solution component 𝑥 from (4.13) coincides with the solution of (4.14).
Proof. To prove (4.15), noting that
ˆ 2𝑋 = ∥𝑥∥ 2𝑋0 + ∥ 𝑥∥
∥(𝑥, 𝑥)∥ ˆ 2𝑋ˆ ,
we start by bounding ∥𝑥∥ 𝑋0 as follows:
|𝑏 0 (𝑥, 𝑦)|
𝛾0 ∥𝑥∥ 𝑋0 ≤ sup by (4.12b)
0≠𝑦0 ∈𝑌0 ∥𝑦∥𝑌
|𝑏 0 (𝑥, 𝑦) + 𝑏(
ˆ 𝑥,
ˆ 𝑦)|
≤ sup by (4.12a)
0≠𝑦0 ∈𝑌0 ∥𝑦∥𝑌
|𝑏( (𝑥, 𝑥),
ˆ 𝑦)|
≤ sup as 𝑌0 ⊆ 𝑌 .
0≠𝑦 ∈𝑌 ∥𝑦∥𝑌
Next, to bound ∥ 𝑥∥
ˆ 𝑋ˆ , using (4.12c),
| 𝑏(
ˆ 𝑥,
ˆ 𝑦)| |𝑏( (𝑥, 𝑥),
ˆ 𝑦) − 𝑏 0 (𝑥, 𝑦)|
𝛾ˆ ∥ 𝑥∥
ˆ 𝑋ˆ ≤ sup = sup
0≠𝑦 ∈𝑌 ∥𝑦∥𝑌 0≠𝑦 ∈𝑌 ∥𝑦∥𝑌
|𝑏( (𝑥, 𝑥),
ˆ 𝑦)|
≤ ∥𝑏 0 ∥ ∥𝑥∥ 𝑋0 + sup .
0≠𝑦 ∈𝑌 ∥𝑦∥𝑌
Combining these bounds for ∥𝑥∥ 𝑋0 and ∥ 𝑥∥ˆ 𝑋ˆ , we obtain (4.15).
To prove the remaining claims, note that since we may choose 𝑥 and 𝑥ˆ inde-
pendently in the definition of 𝑍, a 𝑦 ∈ 𝑌 is in 𝑍 if and only if 𝑏 0 (𝑥, 𝑦) = 0 for all
𝑥 ∈ 𝑋0 and 𝑏( ˆ 𝑦) = 0 for all 𝑥ˆ ∈ 𝑋.
ˆ 𝑥, ˆ The latter holds if and only if 𝑦 ∈ 𝑌0 due
to (4.12a). Hence (4.16) follows. The unique solvability of both (4.13) and (4.14)
then follows from the equivalence of (1.2) and (1.1). Finally, restricting the test
space in (4.13) to 𝑌0 and using (4.12a), we find that its solution component 𝑥 must
also solve (4.14).
In particular examples, to apply the theorem, the main work is in verifying its
assumptions. We shall now see examples of how to do so. A one-dimensional
example is provided by a hybrid version of the formulation of Example 2.6 (using a
natural broken space). Its analysis can be found in Demkowicz and Gopalakrishnan
(2011b, § III). Here we proceed directly to the more interesting multi-dimensional
examples of Laplace and Maxwell equations.
Example 4.4 (Laplace equation: wellposedness of primal DPG formulation).
Continuing Example 4.2, we fit it into the framework above by setting
𝑋0 = 𝐻˚ 1 (𝛺), 𝑌0 = 𝐻˚ 1 (𝛺), (4.17a)
𝑋ˆ = 𝐻 −1/2 (𝜕𝛺 ℎ ), 𝑌 = 𝐻 1 (𝛺 ℎ ), (4.17b)
ˆ 𝑞ˆ 𝑛 , 𝑦) = ⟨𝑞ˆ 𝑛 , 𝑦⟩ℎ .
𝑏 0 (𝑢, 𝑦) = (grad 𝑢, grad 𝑦)ℎ , 𝑏( (4.17c)

To verify (4.12a), letting

𝑌˜ = {𝑦 ∈ 𝐻 1 (𝛺 ℎ ) : ⟨𝑞ˆ 𝑛 , 𝑦⟩ℎ = 0 for all 𝑞ˆ 𝑛 ∈ 𝐻 −1/2 (𝜕𝛺 ℎ )},
we must prove that 𝐻˚ 1 (𝛺) = 𝑌˜ . Recall that 𝑞ˆ 𝑛 above is always of the form
𝑛 · 𝑞 for some 𝑞 ∈ 𝐻(div, 𝛺). Let 𝑦 ∈ 𝐻˚ 1 (𝛺). Its zero trace implies that
⟨𝑦, 𝑞 · 𝑛⟩ 𝐻 1/2 (𝜕𝛺) = 0 for any 𝑞 ∈ 𝐻(div, 𝛺). Hence, two integrations by parts, one
element by element and the other over 𝛺, give
⟨𝑦, 𝑞 · 𝑛⟩ℎ = (𝑦, div 𝑞)𝛺 + (grad 𝑦, 𝑞)𝛺 = ⟨𝑦, 𝑞 · 𝑛⟩ 𝐻 1/2 (𝜕𝛺) , (4.18)

and since the last term vanishes, 𝑦 ∈ 𝑌˜ , i.e. 𝐻˚ 1 (𝛺) ⊆ 𝑌˜ .

For the reverse inclusion, consider a 𝑦 ∈ 𝑌˜ . Note that the distributional gradient
of a 𝑦 ∈ 𝐻 1 (𝛺 ℎ ) acting on a 𝜙 in the Schwartz test space D(𝛺) 𝑁 satisfies
(grad 𝑦)(𝜙) = −(𝑦, div 𝜙)𝛺 = (grad 𝑦, 𝜙)ℎ − ⟨𝑦, 𝑛 · 𝜙⟩ℎ
where we have integrated by parts, element by element. The last term vanishes by
the given condition on 𝑦. Hence grad 𝑦 ∈ 𝐿 2 (𝛺) 𝑁 . Now integrating by parts again,
but this time over 𝛺, we find, by (4.18), that ⟨𝑦, 𝑞 · 𝑛⟩ 𝐻 1/2 (𝜕𝛺) = ⟨𝑦, 𝑞 · 𝑛⟩ℎ = 0 for
all 𝑞 ∈ 𝐻(div, 𝛺). Hence 𝑦| 𝜕𝛺 = 0 and 𝑦 ∈ 𝐻˚ 1 (𝛺). Thus
𝐻˚ 1 (𝛺) = 𝑌˜ (4.19)
and (4.12a) holds.
Condition (4.12b) obviously holds, since the 𝑏 0 (·, ·) set in (4.17) is coercive on
𝑌0 by the Friedrichs inequality. It only remains to verify (4.12c). To do so, given a
𝑞ˆ 𝑛 ∈ 𝐻 −1/2 (𝜕𝛺 ℎ ), consider 𝑞 ∈ 𝐻(div, 𝐾) and 𝑤 ∈ 𝐻 1 (𝐾) solving
− grad(div 𝑞) + 𝑞 = 0 𝑛 · 𝑞 = 𝑞ˆ 𝑛 on 𝜕𝐾,
in 𝐾, (4.20)
𝜕𝑤
− div(grad 𝑤) + 𝑤 = 0 in 𝐾, = 𝑞ˆ 𝑛 on 𝜕𝐾. (4.21)
𝜕𝑛
The boundary value problems (4.21) and (4.20) are equivalent in the sense that
𝑤 solves (4.21) if and only if 𝑞 = grad 𝑤 solves (4.20) and moreover ∥𝑤∥ 𝐻 1 (𝐾) =
∥𝑞∥ 𝐻(div,𝐾) . It is also obvious that from among all 𝐻(div, 𝐾)-extensions of 𝑞ˆ 𝑛 , the
solution of (4.20) has the minimal 𝐻(div, 𝐾)-norm, so
∥ 𝑞ˆ 𝑛 ∥ 𝐻 −1/2 (𝜕𝐾) = ∥𝑞∥ 𝐻(div,𝐾) = ∥𝑤∥ 𝐻 1 (𝐾)
|(grad 𝑤, grad 𝑣)𝐾 + (𝑤, 𝑣)𝐾 |
= sup
0≠𝑣 ∈ 𝐻 1 (𝐾) ∥𝑣∥ 𝐻 1 (𝐾)
|⟨𝑞ˆ 𝑛 , 𝑣⟩ 𝐻 1/2 (𝜕𝐾) |
= sup , (4.22)
0≠𝑣 ∈ 𝐻 1 (𝐾) ∥𝑣∥ 𝐻 1 (𝐾)
where we used the variational form of (4.21) in the last step. Squaring and summing
over all 𝐾 ∈ 𝛺 ℎ , we find that (4.12c) holds with 𝛾ˆ = 1. (More identities similar to
(4.22) appear in Theorem 4.6 below.)

Having verified the assumptions, Theorem 4.3 now gives the inf-sup condition
(grad 𝑢, grad 𝑦)ℎ + ⟨𝑞ˆ 𝑛 , 𝑦⟩ℎ
∥(𝑢, 𝑞ˆ 𝑛 )∥ 𝐻˚ 1 (𝛺)×𝐻 −1/2 (𝜕𝛺ℎ ) ≲ sup , (4.23)
0≠𝑦 ∈ 𝐻 1 (𝛺ℎ ) ∥𝑦∥ 𝐻 1 (𝛺ℎ )
thus proving the wellposedness of the primal DPG formulation for the Laplace
equation.
Example 4.5 (Maxwell equations). We now develop and analyse a primal DPG
method for the cavity problem in electromagnetics. Let the cavity 𝛺 be an open
bounded contractible domain in R3 , on the boundary of which the so-called perfect
electric conducting boundary condition is placed. Assuming that all time variations
are harmonic of frequency 𝜔 > 0, Maxwell equations in the cavity are
−ˆ𝚤 𝜔𝜇𝐻 + curl 𝐸 = 0 in 𝛺, (4.24a)
−ˆ𝚤 𝜔𝜖 𝐸 − curl 𝐻 = −𝐽 in 𝛺, (4.24b)
𝑛×𝐸 =0 on 𝜕𝛺. (4.24c)
The functions 𝐸, 𝐻, 𝐽 : 𝛺 → C3 represent electric field, magnetic field and im-
posed current, respectively, and 𝚤ˆ denotes the imaginary unit. We assume that
the electromagnetic material properties 𝜖 and 𝜇 are bounded uniformly positive
functions on 𝛺. The number 𝜔 > 0 denotes a fixed wavenumber. Eliminating 𝐻
from (4.24a) and (4.24b), we obtain the second-order (non-elliptic) equation
curl 𝜇 −1 curl 𝐸 − 𝜔2 𝜖 𝐸 = 𝑓 , (4.25)
where 𝑓 = 𝚤ˆ𝜔𝐽. Let
𝐻(curl, 𝛺) = {𝐹 ∈ 𝐿 2 (𝛺)3 : curl 𝐹 ∈ 𝐿 2 (𝛺)3 }
˚
and let 𝐻(curl, 𝛺) denote the subspace of vector fields in 𝐻(curl, 𝛺) with zero
tangential trace on 𝜕𝛺. A standard variational formulation for this problem is
obtained by multiplying (4.25) by a test function 𝐹 ∈ 𝐻(curl,
˚ 𝛺), integrating by
parts and using the boundary condition (4.24c): find 𝐸 ∈ 𝐻(curl,
˚ 𝛺) satisfying
(𝜇 −1 curl 𝐸, curl 𝐹)𝛺 − 𝜔2 (𝜖 𝐸, 𝐹)𝛺 = ⟨ 𝑓 , 𝐹⟩ (4.26)
for any given 𝑓 ∈ 𝐻(curl,
˚ 𝛺)′ . It is well known (see Monk 2003) that (4.26) has
a unique solution for every 𝑓 ∈ 𝐻(curl,
˚ 𝛺)′ whenever 𝜔 is not a resonance of the
cavity 𝛺, an assumption we place throughout this example.
The primal DPG method for the electric cavity problem is obtained by multiply-
ing (4.25) by a test function 𝐹 in the ‘broken’ space
Ö
𝐻(curl, 𝛺 ℎ ) = 𝐻(curl, 𝐾)
𝐾 ∈𝛺ℎ

and integrating by parts, element by element. On a single element 𝐾 ∈ 𝛺 ℎ , we get

(𝜇 −1 curl 𝐸, curl 𝐹)𝐾 + (𝑛 × 𝜇 −1 curl 𝐸, 𝐹)𝜕𝐾 − 𝜔2 (𝜀𝐸, 𝐹)𝐾 = ( 𝑓 , 𝐹)𝐾 . (4.27)

To set the element boundary term in the right space, a space akin to (4.10), let us
recall a few pertinent results on tangential traces on Lipschitz boundaries.
The tangential trace maps
𝐾
𝐸 ↦→ tr𝑛× 𝐸 : = (𝑛 × 𝐸)| 𝜕𝐾 , 𝐸 ↦→ tr⊤𝐾 𝐸 ≔ 𝑛 × (𝐸 × 𝑛)| 𝜕𝐾 ≡ 𝐸 ⊤ | 𝜕𝐾 ,
both well-defined for smooth vectors fields 𝐸 on any mesh element 𝐾 ∈ 𝛺 ℎ , can
be extended to continuous linear maps
𝐾
tr𝑛× : 𝐻(curl, 𝐾) → 𝐻 −1/2 (divF , 𝜕𝐾), tr⊤𝐾 : 𝐻(curl, 𝐾) → 𝐻 −1/2 (curlF , 𝜕𝐾)
by the work of Buffa, Costabel and Sheen (2002), which contains the definitions of
the codomain spaces and the surface derivatives (divF and curlF ) above. Moreover,
their results imply that the integration-by-parts formula
(curl 𝐸, 𝐹)𝐾 − (𝐸, curl 𝐹)𝐾 = (𝑛 × 𝐸, 𝐹)𝜕𝐾
for smooth vector fields 𝐸 and 𝐹 on 𝐾 can be extended to 𝐸, 𝐹 ∈ 𝐻(curl, 𝛺),
with the understanding that the right-hand side above becomes a duality pairing
⟨tr𝑛×
𝐾 𝐸, tr𝐾 𝐹⟩ −1/2 (div , 𝜕𝐾) and 𝐻 −1/2 (curl , 𝜕𝐾). Re-
⊤ 𝐻 −1/2 (curlF ,𝜕𝐾) between 𝐻 F F
using the notation of (·, ·)ℎ and ⟨·, ·⟩ℎ in (4.8) by extending inner products to vector
fields in the obvious way and letting
∑︁
𝐾
⟨𝑛 × 𝐸, 𝐹⟩ℎ = ⟨tr𝑛× 𝐸, tr⊤𝐾 𝐹⟩ 𝐻 −1/2 (curlF ,𝜕𝐾) ,
𝐾 ∈𝛺ℎ

we sum (4.27) over all 𝐾 ∈ 𝛺 ℎ to obtain

(𝜇 −1 curl 𝐸, curl 𝐹)ℎ + ⟨𝑛 × 𝜇 −1 curl 𝐸, 𝐹⟩ℎ − 𝜔2 (𝜀𝐸, 𝐹)ℎ = ( 𝑓 , 𝐹)ℎ . (4.28)
To set the interface term in the right space, we need some more machinery.
Applying the trace operators tr𝑛× 𝐾 and tr𝐾 element by element, we define
⊤
Ö
tr𝑛× : 𝐻(curl, 𝛺) → 𝐻 −1/2 (divF , 𝜕𝐾),
𝐾 ∈𝛺ℎ (4.29)
𝐾
(tr𝑛× 𝐸)| 𝜕𝐾 = tr𝑛× 𝐸 ≡ (𝑛 × 𝐸)| 𝜕𝐾 ,
and Ö
tr⊤ : 𝐻(curl, 𝛺) → 𝐻 −1/2 (curlF , 𝜕𝐾),
𝐾 ∈𝛺ℎ (4.30)
𝐾
(tr⊤ 𝐸)| 𝜕𝐾 = tr⊤ 𝐸 ≡ (𝑛 × (𝐸 × 𝑛))| 𝜕𝐾 .
Next, analogous to what we did in (4.10), we define interface spaces as the ranges
of the above trace operators, endowed with a quotient norm, namely
𝐻 −1/2 (divF , 𝜕𝛺 ℎ ) ≔ range(tr𝑛× ),
∥𝑛 × 𝐸ˆ ∥ 𝐻 −1/2 (divF ,𝜕𝛺ℎ ) ≔ inf ∥𝐸 ∥ 𝐻(curl,𝛺) ,
−1 {𝑛× 𝐸
𝐸 ∈tr𝑛× ˆ}

and
𝐻 −1/2 (curlF , 𝜕𝛺 ℎ ) ≔ range(tr⊤ ),
∥ 𝐸ˆ ⊤ ∥ 𝐻 −1/2 (curlF ,𝜕𝛺ℎ ) ≔ inf ∥𝐸 ∥ 𝐻(curl,𝛺) , (4.31)
−1 {𝐸 }
𝐸 ∈tr⊤ ⊤

where the preimage sets are

−1 𝐾
tr𝑛× {𝑛 × 𝐸ˆ } = {𝐸 ∈ 𝐻(curl, 𝛺) : tr𝑛× 𝐸 = (𝑛 × 𝐸)|
ˆ 𝜕𝐾 on each 𝐾 ∈ 𝛺 ℎ }
and
tr⊤−1 { 𝐸ˆ ⊤ } = {𝐸 ∈ 𝐻(curl, 𝛺) : tr⊤𝐾 𝐸 = 𝐸ˆ ⊤ | 𝜕𝐾 on each 𝐾 ∈ 𝛺 ℎ }.
Returning to (4.28), we now set 𝑛× 𝐻ˆ = (ˆ𝚤 𝜔)−1 𝑛×𝜇 −1 curl 𝐸 to be an independent
unknown on element boundaries, to be found in 𝐻 −1/2 (divF , 𝛺 ℎ ). Then (4.27) leads
to the variational problem (4.13) with the following spaces and forms:
𝑋0 = 𝐻(curl,
˚ 𝛺), 𝑌 = 𝐻(curl, 𝛺 ℎ ), (4.32a)
𝑋ˆ = 𝐻 −1/2 (divF , 𝜕𝛺 ℎ ), 𝑌0 = 𝐻(curl,
˚ 𝛺), (4.32b)
−1 2
𝑏 0 (𝐸, 𝐹) = (𝜇 curl 𝐸, curl 𝐹)ℎ − 𝜔 (𝜀𝐸, 𝐹)ℎ , (4.32c)
𝑏(𝑛 × 𝐻, ˆ 𝐹) = 𝚤ˆ𝜔⟨𝑛 × 𝐻,
ˆ 𝐹⟩ℎ . (4.32d)
This is the primal DPG formulation for the Maxwell cavity problem.
To prove that this formulation is wellposed, we verify the conditions of The-
orem 4.3. It is easy to verify (4.12a) by extending the same technique we used to
prove it in Example 4.4. Condition (4.12b) follows from the previously mentioned
unique solvability of (4.26) and the equivalence of (1.1) and (1.2). Finally, condi-
tion (4.12c) follows from (4.35c) of the next theorem (Theorem 4.6) below. Hence
Theorem 4.3 gives wellposedness of the formulation (4.32).
The next result shows that the argument we used to prove (4.22) in Example 4.4
can be generalized to get other similar identities for quotient norms. Define the
broken version of 𝐻(div, 𝛺) by
Ö
𝐻(div, 𝛺 ℎ ) = 𝐻(div, 𝐾). (4.33)
𝐾 ∈𝛺ℎ

Complementing already defined trace operators tr𝑛 , tr𝑛× and tr⊤ (in (4.9), (4.29)
and (4.30), respectively), define standard 𝐻 1 trace operator, applied elementwise,
by
Ö
tr : 𝐻 1 (𝛺) → 𝐻 1/2 (𝜕𝐾), (tr 𝑢)| 𝜕𝐾 = 𝑢| 𝜕𝐾 , (4.34)
𝐾 ∈𝛺ℎ

and let
𝐻 1/2 (𝜕𝛺 ℎ ) ≔ range(tr), ∥ 𝑢∥
ˆ 𝐻 1/2 (𝜕𝛺ℎ ) ≔ inf ∥𝑢∥ 𝐻 1 (𝛺) ,
𝑢∈tr −1 { 𝑢}
ˆ

where the quotient norm is a standard norm obtained by a ‘minimal energy ex-
tension’ as an infimum of the norm of all extensions of 𝑢ˆ in the preimage set

tr −1 {𝑢}
ˆ = {𝑢 ∈ 𝐻 1 (𝛺) : 𝑢| 𝜕𝐾 = 𝑢|
ˆ 𝜕𝐾 }. The identity (4.35b) below shows that
this infimum equals a supremum; in fact all identities of (4.35) are of a similar
‘inf = sup’ type.

Theorem 4.6 (Interface duality). The following identities hold for any 𝜎 ˆ 𝑛 in
𝐻 −1/2 (𝜕𝛺 ℎ ), 𝑢ˆ in 𝐻 1/2 (𝜕𝛺 ℎ ), 𝑛×𝐸ˆ in 𝐻 −1/2 (divF , 𝜕𝐾), and 𝐹ˆ⊤ in 𝐻 −1/2 (curl, 𝜕𝐾):
|⟨𝜎
ˆ 𝑛 , 𝑢⟩ℎ |
∥𝜎
ˆ 𝑛 ∥ 𝐻 −1/2 (𝜕𝛺ℎ ) = sup , (4.35a)
0≠𝑢∈ 𝐻 1 (𝛺ℎ ) ∥𝑢∥ 𝐻 1 (𝛺ℎ )
|⟨𝑛 · 𝜎, 𝑢⟩
ˆ ℎ|
∥ 𝑢∥
ˆ 𝐻 1/2 (𝜕𝛺ℎ ) = sup , (4.35b)
0≠𝜎 ∈ 𝐻(div,𝛺ℎ ) ∥𝜎∥ 𝐻(div,𝛺ℎ )
|⟨𝑛 × 𝐸,
ˆ 𝐹⟩ℎ |
∥𝑛 × 𝐸ˆ ∥ 𝐻 −1/2 (divF ,𝜕𝛺ℎ ) = sup , (4.35c)
0≠𝐹 ∈ 𝐻(curl,𝛺ℎ ) ∥𝐹 ∥ 𝐻(curl,𝛺ℎ )
|⟨𝑛 × 𝐸, 𝐹ˆ⊤ ⟩ℎ |
∥ 𝐹ˆ⊤ ∥ 𝐻 −1/2 (curl,𝜕𝛺ℎ ) = sup . (4.35d)
0≠𝐸 ∈ 𝐻(curl,𝛺ℎ ) ∥𝐸 ∥ 𝐻(curl,𝛺ℎ )
Furthermore,

(a) 𝑣 ∈ 𝐻˚ 1 (𝛺) if and only if ⟨𝜎 ˆ 𝑛 ∈ 𝐻 −1/2 (𝜕𝛺 ℎ ),

ˆ 𝑛 , 𝑣⟩ℎ = 0 for all 𝜎
(b) 𝜏 ∈ 𝐻(div,
˚ 𝛺) if and only if ⟨𝜏 · 𝑛, 𝑢⟩
ˆ ℎ = 0 for all 𝑢ˆ ∈ 𝐻 1/2 (𝜕𝛺 ℎ ), and
(c) 𝐹 ∈ 𝐻(curl,
˚ ˆ 𝐹⟩ℎ = 0 for all 𝑛 × 𝐸ˆ ∈ 𝐻 −1/2 (divF , 𝛺 ℎ ).
𝛺) if and only if ⟨𝑛 × 𝐸,
Proof. The first equality was already proved in (4.22) and the argument is similar
for all identities of (4.35). So we outline the proof of only one more, namely
(4.35c).
Given 𝑛 × 𝐸ˆ in 𝐻 −1/2 (divF , 𝜕𝛺 ℎ ), its norm equals the norm of the following
minimum energy extension 𝐸 ∈ 𝐻(curl, 𝐾) satisfying
curl(curl 𝐸) + 𝐸 = 0 in 𝐾, 𝑛 × 𝐸 = 𝑛 × 𝐸ˆ on 𝜕𝐾. (4.36)
We compare this with the inverse of a Riesz map applied to a functional generated
by 𝑛 × 𝐸,
ˆ namely

curl(curl 𝐹) + 𝐹 = 0 in 𝐾, 𝑛 × (curl 𝐹) = 𝑛 × 𝐸ˆ on 𝜕𝐾. (4.37)

Note that 𝐹 solves (4.37) if and only if 𝐸 = curl 𝐹 solves (4.36). Moreover, (4.37)
implies that curl 𝐸 = −𝐹. Therefore ∥𝐸 ∥ 𝐻(curl,𝐾) = ∥𝐹 ∥ 𝐻(curl,𝐾) and
∥𝑛 × 𝐸ˆ ∥ 𝐻 −1/2 (divF ,𝜕𝐾) = ∥𝐸 ∥ 𝐻(curl,𝐾) = ∥𝐹 ∥ 𝐻(curl,𝐾)
|(curl 𝐹, curl 𝐺)𝐾 + (𝐹, 𝐺)𝐾 |
= sup
0≠𝐺 ∈ 𝐻(curl,𝐾) ∥𝐺 ∥ 𝐻(curl,𝐾)
|⟨𝑛 × 𝐸,
ˆ 𝐺 ⊤ ⟩ 𝐻 −1/2 (curl ,𝜕𝐾) |
F
= sup . (4.38)
0≠𝐺 ∈ 𝐻(curl,𝐾) ∥𝐺 ∥ 𝐻(curl,𝐾)

Summing over all elements, and using

ˆ 𝐺⟩ℎ | 2

|⟨𝑛 × 𝐸,
sup
0≠𝐺 ∈ 𝐻(curl,𝛺ℎ ) ∥𝐺 ∥ 𝐻(curl,𝛺ℎ )
∑︁ |⟨𝑛 × 𝐸,
ˆ 𝐺⟩ 𝐻 −1/2 (curl ,𝜕𝐾) | 2
F
= sup ,
𝐾 ∈𝛺 0≠𝐺 ∈ 𝐻(curl,𝐾) ∥𝐺 ∥ 𝑉(𝐾)
ℎ

the identity (4.35c) is proved.

Proofs of all items (a)–(c) are similar to the previously detailed proof of (4.19),
so we omit them.
It is interesting to observe that the norm of the dual space 𝐻 −1/2 (curlF , 𝜕𝐾)∗
occurs in the above proof implicitly. Indeed, since the 𝐻 −1/2 (curlF , 𝜕𝐾)-norm in
(4.31) is the infimum of extension norms, its dual norm equals
|⟨𝑛 × 𝐸,
ˆ 𝐺 ⊤ ⟩ 𝐻 −1/2 (curl ,𝜕𝐾) |
F
∥𝑛 × 𝐸ˆ ∥ 𝐻 −1/2 (curlF ,𝜕𝐾)∗ = sup
0≠𝐺⊤ ∈ 𝐻 −1/2 (curlF ,𝜕𝐾) ∥𝐺 ⊤ ∥ 𝐻 −1/2 (curlF ,𝜕𝐾)
|⟨𝑛 × 𝐸,
ˆ 𝐺 ⊤ ⟩ 𝐻 −1/2 (curl ,𝜕𝐾) |
F
= sup ,
0≠𝐺 ∈ 𝐻(curl,𝐾) ∥𝐺 ∥ 𝐻(curl,𝐾)
which is the supremum in (4.38). Thus the short argument in the previous proof
also shows that
∥𝑛 × 𝐸ˆ ∥ 𝐻 −1/2 (divF ,𝜕𝐾) = ∥𝑛 × 𝐸ˆ ∥ 𝐻 −1/2 (curlF ,𝜕𝐾)∗ , (4.39)
that is, the norms of 𝐻 −1/2 (divF , 𝜕𝐾) and 𝐻 −1/2 (curlF , 𝜕𝐾)∗ are equal.
Bibliographical notes. An alternative and longer proof of the wellposedness of
the DPG formulation for the Laplace equation in Example 4.4 first appeared in
Demkowicz and Gopalakrishnan (2013), using techniques developed for a slightly
different formulation for the same equation from Demkowicz and Gopalakrishnan
(2011a). There, the adjoint inf-sup condition
(grad 𝑢, grad 𝑦)ℎ + ⟨𝑞ˆ 𝑛 , 𝑦⟩ℎ
∥𝑦∥ 𝐻 1 (𝛺ℎ ) ≲ sup
0≠𝑢∈ 𝐻˚ 1 (𝛺), 𝑞ˆ𝑛 ∈ 𝐻 −1/2 (𝜕𝛺ℎ ) ∥(𝑢, 𝑞ˆ 𝑛 )∥ 𝐻˚ 1 (𝛺)×𝐻 −1/2 (𝜕𝛺ℎ )

is proved instead of (4.23). Proving the adjoint inf-sup condition is an alternative

path to wellposedness, in view of the equivalence between (1.2) and (1.3). The
shorter approach we presented in Example 4.4 is facilitated by the simple result of
Theorem 4.3. Similar results can be found in early works such as Brezzi and Fortin
(1991, p. 40), and even in other recent works, e.g. Garg, Prudhomme, van der
Zee and Carey (2014). Our discussions of Theorems 4.3, 4.6 and the Maxwell
case in Example 4.5 are drawn from Carstensen, Demkowicz and Gopalakrishnan
(2016), where further details can be found; see also Demkowicz (2018, § 4.2).
Further properties of the norms in (4.39), including intrinsic characterizations, can
be found in Buffa et al. (2002).

5. Practical DPG methods

Even if the construction of the test space is localized in the ideal DPG method of the
previous section, a practical issue still remains. The computation of (𝑇 𝑧)𝐾 in (4.3)
may require solving an infinite-dimensional problem on an element 𝐾 if 𝑌 (𝐾) is of
infinite dimension. To obtain a practical method, we must trade 𝑌 (𝐾) for a finite-
dimensional space. This section does so, provides a key general tool (Theorem 5.2)
involving Fortin operators to analyse the effect of this replacement on stability and
error estimates, and details error analyses of the practical DPG methods for Laplace
and Maxwell equations. Multiple subsections on Fortin operators show various
techniques to construct Fortin operators that satisfy certain moment conditions
needed for DPG analyses.
We start by introducing 𝑌 𝑟 , a finite-dimensional subspace of 𝑌 , where 𝑟 is related
to its finite dimension. To retain the localization, when 𝑌 is a Cartesian product as
in (4.1), the subspace 𝑌 𝑟 is assumed to be of a similar form,
Ö
𝑌𝑟 = 𝑌 𝑟 (𝐾), 𝑌 𝑟 (𝐾) ⊆ 𝑌 (𝐾). (5.1)
𝐾 ∈𝛺ℎ

In analogy with (2.2), let 𝑇 𝑟 : 𝑋 → 𝑌 𝑟 be defined by

(𝑇 𝑟 𝑤, 𝑦)𝑌 = 𝑏(𝑤, 𝑦) for all 𝑦 ∈ 𝑌 𝑟 . (5.2)
Then (𝑇 𝑟 𝑤)𝐾 ∈ 𝑌 𝑟 (𝐾), the component of 𝑇 𝑟 𝑤 in element 𝐾, is computed locally
within 𝐾 by
((𝑇 𝑟 𝑤)𝐾 , 𝑦 𝐾 )𝑌 (𝐾) = 𝑏(𝑤, 𝑦 𝐾 ) for all 𝑦 𝐾 ∈ 𝑌 𝑟 (𝐾). (5.3)
A practical method is obtained using
𝑌ℎ𝑟 ≔ 𝑇 𝑟 (𝑋ℎ )
opt
in place of the exactly optimal test space 𝑌ℎ of (2.1).
Definition 5.1. A (practical) DPG method is a method that finds 𝑥 ℎ ∈ 𝑋ℎ satis-
fying
𝑏(𝑥 ℎ , 𝑦) = ℓ(𝑦) for all 𝑦 ∈ 𝑌ℎ𝑟 , (5.4)
where 𝑌ℎ𝑟 is computed locally using 𝑇 𝑟 by (5.3) in a finite-dimensional 𝑌 𝑟 of the
product form (5.1).

5.1. A general DPG convergence theorem

opt
In general, 𝑇 𝑟 ≠ 𝑇, and the test space in the practical DPG method, 𝑌ℎ𝑟 ≠ 𝑌ℎ , is
only an inexact version of the optimal test space. Hence, to obtain an error estimate
for the practical DPG method, we cannot rely on the prior theory for the ideal DPG
method. However, imposing an extra condition (see (5.5) below) gives a simple
error analysis, as shown next. The condition involves an operator 𝛱 , which we

shall refer to as a ‘Fortin operator’, based on similar such operators in the study of
mixed methods (Brezzi and Fortin 1991).
Theorem 5.2 (Fortin operator gives DPG convergence). Suppose (1.3) holds,
𝑋ℎ ⊂ 𝑋 and 𝑌 𝑟 ⊂ 𝑌 . Assume that there is a bounded linear operator 𝛱 : 𝑌 → 𝑌 𝑟 ,
of operator norm ∥𝛱 ∥, such that for all 𝑤 ℎ ∈ 𝑋ℎ and all 𝑣 ∈ 𝑌 ,
𝑏(𝑤 ℎ , 𝑣 − 𝛱 𝑣) = 0. (5.5)
Then the DPG method (5.4) is uniquely solvable for 𝑥 ℎ and
∥𝑏∥ ∥𝛱 ∥
∥𝑥 − 𝑥 ℎ ∥ 𝑋 ≤ inf ∥𝑥 − 𝑧 ℎ ∥ 𝑋 , (5.6)
𝛾 𝑧ℎ ∈𝑋ℎ

where 𝑥 is the unique exact solution of (1.1).

Proof. The proof proceeds by verifying the assumptions of Theorem 1.1. Let us
first prove that (5.5) implies that
𝑇 𝑟 : 𝑋ℎ → 𝑌 𝑟 is injective. (5.7)
Indeed, if ℎ = 0 for some 𝑤 ℎ ∈ 𝑋ℎ , then by (5.2), 𝑏(𝑤 ℎ
𝑇𝑟 𝑤 = 0 for all , 𝑦𝑟 )
𝑦 ∈ 𝑌 , which implies that 𝑏(𝑤 ℎ , 𝛱 𝑦) = 0 for all 𝑦 ∈ 𝑌 . But (5.5) then shows that
𝑟 𝑟

𝑏(𝑤 ℎ , 𝑦) = 0 for all 𝑦 ∈ 𝑌 , so by (1.3), 𝑤 ℎ = 0. Therefore we have verified that

dim(𝑌ℎ𝑟 ) = dim(𝑋ℎ ).
To verify the inf-sup condition, fix an arbitrary 𝑧 ℎ ∈ 𝑋ℎ , and let
|𝑏(𝑧 ℎ , 𝑦)| |𝑏(𝑧 ℎ , 𝑦 𝑟 )| |𝑏(𝑧 ℎ , 𝑦 𝑟ℎ )|
𝑠0 = sup , 𝑠1 = sup , 𝑠2 = sup .
0≠𝑦 ∈𝑌 ∥𝑦∥𝑌 0≠𝑦 ∈𝑌 𝑟 ∥𝑦 𝑟 ∥𝑌 0≠𝑦 ∈𝑌ℎ𝑟 ∥𝑦 𝑟ℎ ∥𝑌
The result will follow from Theorem 1.1 once we prove the discrete inf-sup condi-
tion
𝛾 ∥𝛱 ∥ −1 ∥𝑧 ℎ ∥ 𝑋 ≤ 𝑠2 . (5.8)
We proceed to bound ∥𝑧 ℎ ∥ 𝑋 using 𝑠0 , then 𝑠1 , and finally 𝑠2 . Since (1.3) is
equivalent to (1.2), the inf-sup condition 𝛾∥𝑧 ℎ ∥ 𝑋 ≤ 𝑠0 holds. Hence (5.5) implies
|𝑏(𝑧 ℎ , 𝑦)| |𝑏(𝑧 ℎ , 𝛱 𝑦)|
𝛾∥𝑧 ℎ ∥ 𝑋 ≤ sup = sup
0≠𝑦 ∈𝑌 ∥𝑦∥𝑌 0≠𝑦 ∈𝑌 ∥𝑦∥𝑌
|𝑏(𝑧 ℎ , 𝛱 𝑦)| |𝑏(𝑧 ℎ , 𝑦 𝑟 )|
≤ sup ≤ sup ,
0≠𝑦 ∈𝑌 ∥𝛱 ∥ −1 ∥𝛱 𝑦∥𝑌 0≠𝑦 ∈𝑌 𝑟 ∥𝛱 ∥ −1 ∥𝑦 𝑟 ∥𝑌
that is, 𝛾 ∥𝛱 ∥ −1 ∥𝑧 ℎ ∥ 𝑋 ≤ 𝑠1 . To finish the proof of (5.8), it suffices to show that
𝑠1 ≤ 𝑠2 . The argument of Proposition 2.1 shows that the supremum 𝑠1 is attained
at 𝑇 𝑟 𝑧 ℎ , so
(𝑇 𝑟 𝑧 ℎ , 𝑇 𝑟 𝑧 ℎ )𝑌 (𝑇 𝑟 𝑧 ℎ , 𝑦 𝑟ℎ )𝑌
𝑠1 = ≤ sup = 𝑠2 .
∥𝑇 𝑟 𝑧 ℎ ∥𝑌 0≠𝑦ℎ𝑟 ∈𝑌ℎ𝑟 ∥𝑦 𝑟ℎ ∥𝑌

This shows (5.8) and finishes the proof.

Example 5.3 (Test spaces containing the optimal test functions). Consider the
DPG method obtained using
opt
𝑌 𝑟 ⊇ 𝑌ℎ . (5.9)
Then, setting 𝛱 to 𝛱𝑌 𝑟 , the 𝑌 -orthogonal projection into 𝑌 𝑟 , observe that for any
𝑧 ℎ ∈ 𝑋ℎ and any 𝑦 ∈ 𝑌 ,
𝑏(𝑧 ℎ , 𝑦 − 𝛱𝑌 𝑟 𝑦) = (𝑇 𝑧 ℎ , 𝑦 − 𝛱𝑌 𝑟 𝑦)𝑌 by (2.2)
=0 since 𝑇 𝑧 ℎ ∈ 𝑌 𝑟 .
Hence Theorem 5.2 applies, and moreover, in (5.6) we may set ∥𝛱 ∥ = 1 since 𝛱
is an orthogonal projection. Thus the DPG solution, in this case, satisfies exactly
the same error estimate (2.4) as the ideal Petrov–Galerkin method.
More can be said by noting that (5.9) implies
𝑇 𝑟 𝑤 = 𝑇 𝑤, 𝑤 ∈ 𝑋. (5.10)
Indeed, restricting the test functions 𝑦 in defining equation (2.2) for 𝑇 𝑤 to 𝑦 = 𝑦 𝑟 ∈
𝑌 𝑟 , we find that 𝑇 𝑤 ∈ 𝑌 𝑟 also satisfies equation (5.2) defining 𝑇 𝑟 𝑤, thus proving
the equality of 𝑇 𝑤 and 𝑇 𝑟 𝑤 stated in (5.10). It then immediately implies that the
opt
solution of the IPG method with the optimal test space 𝑌ℎ and the solution of the
𝑟
DPG method with a test space 𝑌 satisfying (5.9) must coincide.
Since (5.9) seldom holds in practical multi-dimensional examples, typical ap-
plications of Theorem 5.2 involve more complex Fortin operators 𝛱 , as we shall
see next. Nonetheless, this discussion shows that enlarging the test space beyond
the optimal test space does not degrade stability or error estimates.
Bibliographical notes. The result of Theorem 5.2 and several of its applications, in-
cluding Fortin operators useful for analysing DPG methods for the Poisson equation
(see Example 5.5 below) and the elasticity equation (not discussed in this review),
were presented first in Gopalakrishnan and Qiu (2014). Fortin operators for DPG
analysis of plate-bending problems were given in Führer and Heuer (2019).

5.2. First example of a non-trivial DPG Fortin operator

As previously mentioned, DPG methods use test spaces of the product form (4.1),
usually obtained as broken Sobolev spaces. Construction of Fortin operators on
broken Sobolev spaces can therefore be done focusing only on one element. We
proceed to construct a local Fortin operator on the broken 𝐻 1 space and use it to
analyse the primal DPG method for the Laplace example.
From now on, we assume that the mesh 𝛺 ℎ is a geometrically conforming finite
element mesh of simplicial elements. Let △ 𝑗 𝐾 denote the set of 𝑗-dimensional
subsimplices of an 𝑁-simplex 𝐾. The set of mesh facets, denoted by Fℎ , is the
union of △ 𝑁 −1 𝐾 for all 𝐾 ∈ 𝛺 ℎ . Let 𝑃 𝑝 (𝐷) denote the space of polynomials of

degree at most 𝑝 restricted to a domain 𝐷, and let

𝑅 𝑝 (𝐷) = 𝑃 𝑝−1 (𝐷) 𝑁 + 𝑥𝑃 𝑝−1 (𝐷) (5.11)
denote the Raviart–Thomas element (Raviart and Thomas 1977a) which generates
the finite element space
𝑅 ℎ𝑝 = {𝑟 ∈ 𝐻(div, 𝛺) : 𝑟 | 𝐾 ∈ 𝑅 𝑝 (𝐾)}. (5.12)
We shall also use the space
Ö
𝑃 𝑝 (𝛺 ℎ ) = 𝑃 𝑝 (𝐾),
𝐾 ∈𝛺ℎ

often used in discontinuous Galerkin (DG) methods. Let ℎ 𝐾 = diam(𝐾). We write

𝐴≲𝐵
to indicate that the inequality 𝐴 ≤ 𝐶 𝐵 holds with some constant 𝐶 (whose value at
different occurrences may differ) independent of ℎ 𝐾 but possibly dependent on the
shape regularity of 𝐾 and the polynomial degree 𝑝. We will prove the following
theorem shortly after indicating how it is applied in a DPG method.

Theorem 5.4 (A local Fortin operator for 𝐻 1 in 𝑁 dimensions). Let 𝑣 ∈ 𝐻 1 (𝐾)

on an 𝑁-simplex 𝐾 and let 𝑟 = 𝑝 + 𝑁. There is a locally constructible continuous
grad
linear operator 𝛱𝑟 : 𝐻 1 (𝐾) → 𝑃𝑟 (𝐾) satisfying
grad
(𝛱𝑟 𝑣 − 𝑣, 𝑞)𝐾 = 0 for all 𝑞 ∈ 𝑃 𝑝−1 (𝐾), (5.13a)
grad
(𝛱𝑟 𝑣 − 𝑣, 𝜇)𝐹 = 0 for all 𝜇 ∈ 𝑃 𝑝 (𝐹), 𝐹 ∈ △ 𝑁 −1 𝐾, (5.13b)
grad
∥𝛱𝑟 𝑣∥ 𝐻 1 (𝐾) ≲∥𝑣∥ 𝐻 1 (𝐾) for all 𝑣 ∈ 𝐻 1 (𝐾), (5.13c)
grad
𝛱𝑟 𝑐 =𝑐 for all constant functions 𝑐. (5.13d)
This result holds for all integers 𝑝 ≥ 0 with the understanding that when 𝑝 = 0,
condition (5.13a) is vacuous.

Example 5.5 (Laplace equation: discrete stability and error estimates). We

now analyse a discretization of the primal DPG formulation of Example 4.4, making
critical use of Theorem 5.4. In particular, we shall see that the moment conditions
(5.13a)–(5.13b) help us verify the Fortin property (5.5).
Recall the trial and test spaces set in (4.17). The sesquilinear form of the problem
on 𝑋 × 𝑌 with 𝑋 = 𝑋0 × 𝑋ˆ = 𝐻˚ 1 (𝛺) × 𝐻 −1/2 (𝜕𝛺 ℎ ) and 𝑌 = 𝐻 1 (𝛺 ℎ ) is
𝑏((𝑢, 𝑞ˆ 𝑛 ), 𝑦) = (grad 𝑢, grad 𝑦)ℎ + ⟨𝑞ˆ 𝑛 , 𝑦⟩ℎ . (5.14)
Recalling the Raviart–Thomas space defined in (5.12) and the element-by-element
trace operator tr𝑛 defined in (4.10), set the discrete trial space by 𝑋ℎ = 𝑋0,ℎ × 𝑋ˆ ℎ ,

where
𝑋0,ℎ = {𝑤 ∈ 𝐻˚ 1 (𝛺) : 𝑤| 𝐾 ∈ 𝑃 𝑝+1 (𝐾) for all 𝐾 ∈ 𝛺 ℎ }, (5.15a)

𝑋ˆ ℎ = tr𝑛 𝑅 ℎ𝑝+1 , (5.15b)
𝑌ℎ = 𝑃 𝑝+𝑁 (𝛺 ℎ ) (5.15c)
for some degree 𝑝 ≥ 0. Clearly, the above set 𝑋0,ℎ is a standard Lagrange finite
element subspace of 𝐻˚ 1 (𝛺) and 𝑌ℎ is a standard DG space. Also, the space 𝑋ˆ ℎ set
above is a subspace of 𝑋ˆ = 𝐻 −1/2 (𝜕𝛺 ℎ ) since 𝑅 ℎ𝑝+1 ⊆ 𝐻(div, 𝛺). An alternative
characterization of 𝑋ˆ ℎ = tr𝑛 𝑅 ℎ𝑝+1 can be given assuming that every 𝐹 ∈ Fℎ is
provided a fixed unit normal 𝑛 𝐹 , which equals the outward pointing unit normal 𝑛
if 𝐹 ⊂ 𝜕𝛺, and equals either 𝑛 or −𝑛 on an interior facet 𝐹 shared by an element
𝐾 with unit outward normal 𝑛. Then it is easy to see from the well-known degrees
of freedom of the Raviart–Thomas space that
𝑋ˆ ℎ = {𝑟ˆ𝑛 : on every 𝐾 ∈ 𝛺 ℎ and each 𝐹 ∈ △ 𝑁 −1 𝐾,
there is a 𝜇 ∈ 𝑃 𝑝 (𝐹) such that 𝑟ˆ𝑛 | 𝜕𝐾 = (𝜇𝑛 𝐹 ) · 𝑛| 𝜕𝐾 }. (5.16)
Consequently, one may choose to implement 𝑋ˆ ℎ without using 𝑅 ℎ𝑝+1 . An imple-
mentation of (5.16) can proceed using only the standard polynomial space 𝑃 𝑝 (𝐹)
on each facet 𝐹 ∈ Fℎ together with some fixed facet orientation given by 𝑛 𝐹 .
Let us examine what the Fortin condition (5.5) entails for this discrete setting.
Let (𝑤 ℎ , 𝑟ˆℎ ) ∈ 𝑋ℎ . Since 𝑟ˆℎ = 𝑟 ℎ · 𝑛 for some 𝑟 ℎ ∈ 𝑅 ℎ𝑝+1 , using the 𝑏(·, ·) in (5.14),
condition (5.5) reads as follows:
(grad 𝑤 ℎ , grad(𝑦 − 𝛱 𝑦))ℎ + ⟨𝑟 ℎ · 𝑛, 𝑦 − 𝛱 𝑦⟩ℎ = 0 (5.17)
for all 𝑤 ℎ ∈ 𝑋ℎ,0 and 𝑟 ℎ ∈ 𝑅 ℎ𝑝+1 . By integration by parts, we see that (5.17) is
implied by
(𝑦 − 𝛱 𝑦, Δ𝑤 ℎ )𝐾 = 0 and
(𝑦 − 𝛱 𝑦, (grad 𝑤 ℎ − 𝑟 ℎ ) · 𝑛)𝜕𝐾 = 0
grad
on every 𝐾 ∈ 𝛺 ℎ . Once 𝛱 is set to 𝛱𝑟 of Theorem 5.4, these two identities follow
from (5.13a) and (5.13b), respectively, and we are ready to apply Theorem 5.2. Let
(𝑢, 𝑞ˆ 𝑛 ) ∈ 𝑋 be the exact solution, and let 𝑞 = grad 𝑢, so that 𝑞ˆ 𝑛 = tr𝑛 (𝑞). If
𝑢 ℎ ∈ 𝑋0,ℎ and 𝑞ˆ 𝑛,ℎ ∈ 𝑋ˆ ℎ together form the solution of the practical DPG method
with discrete spaces set by (5.15), then Theorem 5.2 yields
∥𝑢 − 𝑢 ℎ ∥ 𝐻 1 (𝛺) + ∥ 𝑞ˆ 𝑛 − 𝑞ˆ 𝑛,ℎ ∥ 𝐻 −1/2 (𝜕𝛺ℎ )

≲ inf ∥𝑢 − 𝑤 ℎ ∥ 𝐻 1 (𝛺) + ∥ 𝑞ˆ 𝑛 − 𝑟ˆ𝑛,ℎ ∥ 𝐻 −1/2 (𝜕𝛺ℎ ) .
(𝑤ℎ ,𝑟𝑛,ℎ )∈𝑋ℎ

To obtain convergence rates, the standard approximation rates for the Lagrange
finite element space 𝑋ℎ,0 ,
inf ∥𝑢 − 𝑤 ℎ ∥ 𝐻 1 (𝛺) ≲ ℎ 𝑠 |𝑢| 𝐻 1+𝑠 (𝛺) , 0 ≤ 𝑠 ≤ 𝑝 + 1, (5.18)
𝑤ℎ ∈𝑋ℎ,0

may be used to bound the first term in the infimum. For the other term we use the
definition of the 𝐻 −1/2 (𝜕𝛺 ℎ )-norm via the minimal extension norm, that is,
inf ∥ 𝑞ˆ 𝑛 − 𝑟ˆ𝑛,ℎ ∥ 𝐻 −1/2 (𝜕𝛺ℎ ) = inf ∥𝑞 − 𝑟 ℎ ∥ 𝐻(div,𝛺) . (5.19)
𝑟𝑛,ℎ ∈ 𝑋ˆ ℎ 𝑟ℎ ∈ 𝑅 ℎ𝑝+1

To obtain convergence rates from this, we use the standard Raviart–Thomas inter-
polant 𝛱 𝑅 𝑞 ∈ 𝑅 ℎ𝑝+1 , which is well-defined when 𝑞 ∈ 𝐻 𝑠 (𝛺) 𝑁 ∩ 𝐻(div, 𝛺) with
𝑠 > 1/2, together with its commutativity property
div 𝛱 𝑅 𝑞 = 𝛱 𝑝 div 𝑞, (5.20)
where 𝛱 𝑝 denotes the 𝐿 2 (𝛺)-orthogonal projection into 𝑃 𝑘 (𝛺 ℎ ), as follows:
inf ∥𝑞 − 𝑟 ℎ ∥ 2𝐻(div,𝛺) ≤ ∥𝑞 − 𝛱 𝑅 𝑞∥ 2𝛺 + ∥ div(𝑞 − 𝛱 𝑅 𝑞)∥ 2𝛺
𝑟ℎ ∈ 𝑅 ℎ𝑝+1
≤ ∥𝑞 − 𝛱 𝑅 𝑞∥ 2𝛺 + ∥(𝐼 − 𝛱 𝑝 ) div 𝑞∥ 2𝛺
≲ ℎ2𝑠 |𝑞| 2𝐻 𝑠 (𝛺) + ℎ2𝑠 | div 𝑞| 2𝐻 𝑠 (𝛺) , (5.21)
by the usual Bramble–Hilbert argument. Combining (5.18) and (5.21), we obtain
∥𝑢 − 𝑢 ℎ ∥ 𝐻 1 (𝛺) + ∥ 𝑞ˆ 𝑛 − 𝑞ˆ 𝑛,ℎ ∥ 𝐻 −1/2 (𝜕𝛺ℎ ) ≲ ℎ 𝑠 |𝑢| 𝐻 1+𝑠 (𝛺) + ℎ 𝑠 |Δ𝑢| 𝐻 𝑠 (𝛺) , (5.22)
for 1/2 < 𝑠 ≤ 𝑝 + 1.
Although the convergence rate with respect to ℎ in (5.22) is optimal, the last
term demands too much regularity. In the remainder of this example, we show how
to improve the argument using (4.35a) of Theorem 4.6. Instead of (5.19), we start
by applying (4.35a),
inf ∥ 𝑞ˆ 𝑛 − 𝑟ˆ𝑛,ℎ ∥ 𝐻 −1/2 (𝜕𝛺ℎ ) ≤ ∥ tr𝑛 (𝑞 − 𝛱 𝑅 𝑞)∥ 𝐻 −1/2 (𝜕𝛺ℎ )
𝑟𝑛,ℎ ∈ 𝑋ˆ ℎ
⟨tr𝑛 (𝑞 − 𝛱 𝑅 𝑞), 𝑦⟩ℎ
= sup . (5.23)
0≠𝑦 ∈ 𝐻 1 (𝛺ℎ ) ∥𝑦∥ 𝐻 1 (𝛺ℎ )

The numerator above, for any 𝑦 ∈ 𝐻 1 (𝛺 ℎ ), satisfies

⟨tr𝑛 (𝑞 − 𝛱 𝑅 𝑞), 𝑦⟩ℎ = (𝑞 − 𝛱 𝑅 𝑞, grad 𝑦)ℎ + (div(𝑞 − 𝛱 𝑅 𝑞), 𝑦)ℎ
= (𝑞 − 𝛱 𝑅 𝑞, grad 𝑦)ℎ + ((𝐼 − 𝛱 𝑝 ) div 𝑞, (𝐼 − 𝛱 𝑝 )𝑦)ℎ ,
where we have again used (5.20). Hence
⟨tr𝑛 (𝑞 − 𝛱 𝑅 𝑞), 𝑦⟩ℎ
≲ ℎ 𝑠 |𝑞| 𝐻 𝑠 (𝛺) + ℎ∥(𝐼 − 𝛱 𝑝 ) div 𝑞∥ 𝛺 . (5.24)
∥𝑦∥ 𝐻 1 (𝛺ℎ )
When 1/2 < 𝑠 < 1 we can estimate the last term simply by using the fact that the
norm of the orthogonal projection 𝐼 − 𝛱 𝑝 equals one:
ℎ∥(𝐼 − 𝛱 𝑝 ) div 𝑞∥ ≤ ℎ∥ div 𝑞∥, 1/2 < 𝑠 < 1.
When 1 ≤ 𝑠 ≤ 𝑘 + 1, we use the standard approximation estimate for 𝐿 2 -projection:
ℎ∥(𝐼 − 𝛱 𝑝 ) div 𝑞∥ 𝛺 ≲ ℎℎ 𝑠−1 ∥ div 𝑞∥ 𝐻 𝑠−1 (𝛺) , 1 ≤ 𝑠 ≤ 𝑘 + 1.

Using these two estimates to bound the last term in (5.24) and returning to (5.23),
∥𝑢 − 𝑢 ℎ ∥ 𝐻 1 (𝛺) + ∥ 𝑞ˆ 𝑛 − 𝑞ˆ 𝑛,ℎ ∥ 𝐻 −1/2 (𝜕𝛺ℎ )
(
ℎ 𝑠 |𝑢| 𝐻 1+𝑠 (𝛺) + ℎ∥Δ𝑢∥ 𝛺 , 1/2 < 𝑠 < 1,
≲ (5.25)
ℎ 𝑠 |𝑢| 𝐻 1+𝑠 (𝛺) + ℎ 𝑠 |Δ𝑢| 𝐻 𝑠−1 (𝛺) , 1 ≤ 𝑠 ≤ 𝑘 + 1.
This gives optimal rates at reduced regularity requirements compared to (5.22).
For example, in the lowest-order case, if linear Lagrange elements are used for
approximating 𝑢, and if 𝑢 ∈ 𝐻 2 (𝛺), then the DPG error in 𝑢 is 𝑂(ℎ) (a convergence
rate and regularity requirement comparable to the standard finite element method),
and additionally, at the price of including a piecewise constant flux 𝑞ˆ 𝑛 , the DPG
method gives a flux error that is also 𝑂(ℎ).
Having shown a typical application of a Fortin operator to analyse a DPG method
in the above example, let us now proceed to detail the construction of the needed
grad
operator 𝛱𝑟 . Let 𝐾 be an 𝑁-simplex and let
𝑃˚𝑟 (𝐾) = {𝑢 ∈ 𝑃𝑟 (𝐾) : 𝑢| 𝜕𝐾 = 0},
𝐵𝑟0 (𝐾) = {𝑢 ∈ 𝑃𝑟 (𝐾) : 𝑢| 𝐸 = 0 for all 𝐸 ∈ △ 𝑁 −2 𝐾 }.
Let 𝜆0 , . . . , 𝜆 𝑁 denote the standard linear barycentric coordinate functions of an
𝑁-simplex 𝐾, let 𝐹𝑖 be the facet in △ 𝑁 −1 𝐾 where 𝜆𝑖 vanishes, and let
𝑁
Ö 𝑏𝐾 Ö
𝑏𝐾 = 𝜆 𝑗, 𝑏 𝐹𝑖 = = 𝜆𝑗. (5.26)
𝑗=0
𝜆𝑖 𝑗≠𝑖

Clearly 𝑏 𝐾 ∈ 𝑃 𝑁 +1 (𝐾) is the element bubble of the simplex 𝐾 and 𝑏 𝐹 ∈ 𝑃 𝑁 (𝐾) is

the facet bubble of any facet 𝐹 in △ 𝑁 −1 𝐾.
Lemma 5.6. Let 𝑟 = 𝑝 + 𝑁. Then
∑︁
dim 𝐵𝑟0 (𝐾) = dim 𝑃 𝑝−1 (𝐾) + dim 𝑃 𝑝 (𝐹), (5.27)
𝐹 ∈ △ 𝑁 −1 𝐾

and for every 𝑣 ∈ 𝐻 1 (𝐾), there is a unique 𝛱𝑟0 𝑣 ∈ 𝐵𝑟0 (𝐾) satisfying
(𝛱𝑟0 𝑣 − 𝑣, 𝑞)𝐾 = 0 for all 𝑞 ∈ 𝑃 𝑝−1 (𝐾), (5.28a)
(𝛱𝑟0 𝑣 − 𝑣, 𝜇)𝐹 = 0 for all 𝜇 ∈ 𝑃 𝑝 (𝐹), 𝐹 ∈ △ 𝑁 −1 𝐾, (5.28b)
∥𝛱𝑟0 𝑣∥ 𝐿 2 (𝐾) + ℎ 𝐾 ∥ grad 𝛱𝑟0 𝑣∥ 𝐿 2 (𝐾) ≲ ∥𝑣∥ 𝐿 2 (𝐾) + ℎ 𝐾 ∥ grad 𝑣∥ 𝐿 2 (𝐾) . (5.28c)
Proof. Using the space of polynomials of vanishing trace, Í we can count the
dimensions of 𝐵𝑟0 (𝐾). Indeed, dim 𝐵𝑟0 (𝐾) = dim 𝑃˚𝑟 (𝐾) + 𝐹 ∈ △ 𝑁 −1 𝐾 dim 𝑃˚𝑟 (𝐹).
Note that 𝑃˚𝑟 (𝐾) = 𝑏 𝐾 𝑃𝑟 − 𝑁 −1 (𝐾) and 𝑃˚𝑟 (𝐹) = 𝑏 𝐹 𝑃𝑟 − 𝑁 (𝐹) for any 𝐹 ∈ △ 𝑁 −1 𝐾.
Therefore, by choosing 𝑟 = 𝑝 + 𝑁, we have
dim 𝑃˚𝑟 (𝐾) = dim 𝑃 𝑝−1 (𝐾) and dim 𝑃˚𝑟 (𝐹) = dim 𝑃 𝑝 (𝐹),
and consequently (5.28a)–(5.28b) is a square system for 𝛱𝑟0 𝑣.

Now the existence of the stated 𝛱𝑟0 𝑣 will follow from uniqueness, that is, it
suffices to prove that if 𝑣 = 0, then 𝛱𝑟0 𝑣 = 0. Since 𝛱𝑟0 𝑣 ∈ 𝐵𝑟0 (𝐾), on any
face 𝐹 ∈ △ 𝑁 −1 𝐾, we may write (𝛱𝑟0 𝑣)| 𝐹 = 𝑏 𝐹 𝑤 𝑝 for some 𝑤 𝑝 ∈ 𝑃 𝑝 (𝐹). But
then (5.28b) implies that 𝛱𝑟0 𝑣 must vanish on 𝜕𝐾, so 𝛱𝑟0 𝑣 = 𝑏 𝐾 𝑧 𝑝−1 for some
𝑧 𝑝−1 ∈ 𝑃 𝑝−1 (𝐾). Then (5.28a) implies that 𝛱𝑟0 𝑣 = 0 on 𝐾. The estimate (5.28c)
now follows by a standard scaling argument using a reference 𝑁-simplex.
grad
Proof of Theorem 5.4. Let 𝑣 ∈ 𝐻 1 (𝐾) on an 𝑁-simplex 𝐾. Define 𝛱𝑟 𝑣 =
𝛱𝑟0 (𝑣 − 𝑣) + 𝑣, where 𝑣 denotes the mean value of 𝑣 on 𝐾:
∫
1
𝑣= 𝑣.
|𝐾 | 𝐾
Equations (5.13a) and (5.13b) immediately follow from (5.28a) and (5.28b) of
Lemma 5.6 since
grad
𝛱𝑟 𝑣 − 𝑣 = (𝛱𝑟0 − 𝐼)(𝑣 − 𝑣).
To prove (5.13c), we use (5.28c) and the Poincaré inequality as follows:
grad
∥𝛱𝑟 𝑣∥ 𝐿 2 (𝐾) ≤ ∥𝑣∥ 𝐿 2 (𝐾) + ∥𝛱𝑟0 (𝑣 − 𝑣)∥ 𝐿 2 (𝐾)
≲ ∥𝑣∥ 𝐿 2 (𝐾) + ∥𝑣 − 𝑣∥ 𝐿 2 (𝐾) + ℎ 𝐾 ∥ grad(𝑣 − 𝑣)∥ 𝐿 2 (𝐾)
≲ ∥𝑣∥ 𝐿 2 (𝐾) + ℎ 𝐾 ∥ grad 𝑣∥ 𝐿 2 (𝐾)

and
grad
ℎ 𝐾 ∥ grad 𝛱𝑟 𝑣∥ 𝐿 2 (𝐾) = ℎ 𝐾 ∥ grad 𝛱𝑟0 (𝑣 − 𝑣)∥ 𝐿 2 (𝐾)
≲ ∥𝑣 − 𝑣∥ 𝐿 2 (𝐾) + ℎ 𝐾 ∥ grad(𝑣 − 𝑣)∥ 𝐿 2 (𝐾)
≲ ℎ 𝐾 ∥ grad 𝑣∥ 𝐿 2 (𝐾) .
These estimates together imply (5.13c).
When a lower-order reaction term, say (𝑢, 𝑦)𝛺 , is added to the Laplace formu-
lation (5.14), skimming through the analysis of Example 5.5, we immediately see
that we would need another Fortin operator where the moment condition (5.13a)
is strengthened to 𝑞 ∈ 𝑃 𝑝 (𝐾) in place of 𝑞 ∈ 𝑃 𝑝−1 (𝐾). To perform such modific-
ations easily, and also to better understand the structure of the Fortin operator we
have presented, it is useful to know an explicit representation of the prior Fortin
operator and its generalization, which we describe now.
Denote the set of (𝑁 + 1)-term multi-indices of length 𝑚 by
( 𝑁 +1
)
∑︁
𝑁 +1
I𝑚 = 𝛽 ≡ (𝛽1 , . . . , 𝛽 𝑁 +1 ) : 𝛽𝑖 ≥ 0 are integers and |𝛽| ≡ 𝛽𝑖 = 𝑚 .
𝑖=1
(5.29)
𝛼𝑁 +1
On an 𝑁-simplex, recall that a basis for 𝑃𝑚 (𝐾) is given by 𝜆 𝛼 = 𝜆1𝛼1 𝜆2𝛼2 · · · 𝜆 𝑁 +1
for all multi-indices 𝛼 ∈ I𝑚𝑁 +1 . Let 𝜂 𝐾
𝛼 = 𝑏 𝐾 𝜆 𝛼 and let 𝜒 𝐾 ∈ 𝑃 (𝐾) denote the
𝛽 𝑚

dual basis of 𝜆 𝛼 : 𝛼 ∈ I𝑚𝑁 +1 in the (𝑏 𝐾 ·, ·)𝐾 inner product, that is,

𝛼, 𝛽 ∈ I𝑚𝑁 +1 ,
𝛽
𝜂𝐾 𝐾
𝛼 , 𝜒𝛽 𝐾 = 𝛿 𝛼 , (5.30)
𝛽
where 𝛿 𝛼 equals one or zero depending on whether 𝛼 equals 𝛽 or not. Let
∑︁
𝛱𝑚𝐾 𝑣 = 𝑣, 𝜒 𝛼𝐾 𝐾 𝜂 𝐾
𝛼, (5.31a)
𝑁 +1
𝛼∈I𝑚

a polynomial in 𝑃𝑚+𝑁 +1 (𝐾). We see from (5.30) that

𝛱𝑚𝐾 𝑣, 𝜒𝛽𝐾 𝐾 = 𝑣, 𝜒𝛽𝐾 𝐾 for any 𝛽 ∈ I𝑚𝑁 +1 ,

so we obtain

𝛱𝑚𝐾 𝑣 − 𝑣, 𝑞 𝐾
=0 for all 𝑞 ∈ 𝑃𝑚 (𝐾), 𝑣 ∈ 𝐿 2 (𝐾), (5.31b)
after expanding 𝑞 in the 𝜒𝛽𝐾 -basis.
Since this construction works on a simplex of any dimension, we can repeat it
on any subsimplex of 𝐾. The barycentric coordinates of a subsimplex 𝐹 ∈ △ 𝑁 −1 𝐾
are simply the restrictions of those of 𝐾 to 𝐹, omitting the one that vanishes on 𝐹.
Using them, we repeat the construction, now with shorter multi-indices 𝛼, 𝛽 ∈ I 𝑘𝑁 .
Namely, let 𝜂 𝐹𝛼 = 𝑏 𝐹 𝜆 𝛼 and let 𝜒𝛽𝐹 ∈ 𝑃 𝑘 (𝐹) form the dual basis of 𝜆 𝛼 in the
(𝑏 𝐹 ·, ·)𝐹 inner product. Then set
∑︁
𝛱𝑘𝐹 𝑣 = 𝑣, 𝜒 𝛼𝐹 𝐹 𝜂 𝐹𝛼 . (5.32a)
𝛼∈I 𝑘𝑁

This is a polynomial in 𝑃 𝑘+𝑁 (𝐾) since each 𝜂 𝐹𝛼 is a product of 𝑘 + 𝑁 barycentric

coordinates on 𝐹 that have a natural polynomial extension into 𝐾. Furthermore,
this polynomial vanishes on all facets in △ 𝑁 −1 𝐾 different from 𝐹. As in (5.31b),
the analogue of (5.30) on 𝐹 now gives

𝛱𝑘𝐹 𝑣 − 𝑣, 𝑞 𝐹 = 0 for all 𝑞 ∈ 𝑃 𝑘 (𝐹), 𝑣 ∈ 𝐻 1 (𝐾). (5.32b)
Let 𝛱0 denote the 𝐿 2 (𝐾)-orthogonal projection to constants on 𝐾. We put these
ingredients together to construct the polynomial
∑︁
grad
𝛱𝑘,𝑚 𝑣 = 𝛱0 𝑣 + 𝛱𝑘𝐹 + 𝛱𝑚𝐾 𝐼 − 𝛱𝑘𝐹 (𝐼 − 𝛱0 )𝑣. (5.33)
𝐹 ∈ △ 𝑁 −1 𝐾

Note that its trace on a facet 𝐹 is determined solely by the 𝛱𝑘𝐹 -contribution since
˜
the traces after an application of 𝛱𝑚𝐾 or 𝛱𝑘𝐹 are zero on 𝐹 for any 𝐹˜ ≠ 𝐹. Note
that when 𝑚 = 𝑝 − 1 and 𝑘 = 𝑝, we recover the operator of Theorem 5.4.
Theorem 5.7 (A more general 𝐻 1 (𝐾) Fortin operator). The above-defined op-
erator
grad
𝛱𝑘,𝑚 : 𝐻 1 (𝐾) → 𝑃𝑟 (𝐾) for 𝑟 = max(𝑚 + 𝑁 + 1, 𝑘 + 𝑁)

satisfies, for every 𝑣 ∈ 𝐻 1 (𝐾) on an 𝑁-simplex 𝐾, the moment conditions

grad
𝛱𝑘,𝑚 𝑣 − 𝑣, 𝑞 𝐾 = 0 for all 𝑞 ∈ 𝑃𝑚 (𝐾), (5.34a)
grad
𝛱𝑘,𝑚 𝑣 − 𝑣, 𝜇 𝐹 = 0 for all 𝜇 ∈ 𝑃 𝑘 (𝐹), 𝐹 ∈ △ 𝑁 −1 𝐾, (5.34b)
grad
𝛱𝑘,𝑚 𝑐 = 𝑐 for all constant functions 𝑐 on 𝐾, (5.34c)
and the norm estimates
grad
∥ 𝛱𝑘,𝑚 𝑣∥ 𝐿 2 (𝐾) ≲ ∥𝑣∥ 𝐿 2 (𝐾) + ℎ 𝐾 ∥ grad 𝑣∥ 𝐿 2 (𝐾) , (5.34d)
grad
∥ grad 𝛱𝑘,𝑚 𝑣∥ 𝐿 2 (𝐾) ≲ ∥ grad 𝑣∥ 𝐿 2 (𝐾) . (5.34e)
Proof. It is easy to see that using (5.31) and (5.32) that (5.34a) and (5.34b) hold.
Property (5.34c) is immediate from (5.33). The norm estimate of (5.34d) is proved
along the same lines as (5.13) by scaling arguments, and the estimate (5.34e)
follows from (5.34d) and (5.34b).

Bibliographical notes. Theorem 5.4 and the construction of the Fortin operator
grad
𝛱 𝑝+3 are taken from Gopalakrishnan and Qiu (2014). Its generalization in (5.33)
is based on the recent work of Führer and Heuer (2024). They further show that
generalizing such polynomial expressions to certain exponential ones, estimates
like (5.13) but with ∥ · ∥ 𝐻 1 (𝐾) replaced by
1/2
∥𝑣∥ 𝑎 = ∥𝑣∥ 2𝐾 + 𝑎∥ grad 𝑣∥ 2𝐾
for some small parameter 𝑎, can be obtained robustly in the parameter 𝑎 as 𝑎 → 0.
The discrete stability of the primal DPG method for the Laplace equation, discussed
in Example 5.5, was first considered in Demkowicz and Gopalakrishnan (2013).
There, and in earlier DPG analyses such as that of Demkowicz and Gopalakrishnan
(2011a), error estimates comparable to (5.22) that demand extra regularity can be
found. The discussion in Example 5.5, leading to (5.25) with better regularity
requirements, is taken from the more recent work of Führer (2018, Theorem 5).

5.3. A Fortin operator for divergence in 𝑁 dimensions

In this subsection we construct a continuous linear operator 𝛱 𝑝+3 div on 𝐻(div, 𝐾)

satisfying certain moment conditions that are useful in analysis of DPG methods
where 𝐻(div, 𝛺 ℎ ) features in the test space.
We will perform our construction on the reference unit 𝑁-simplex 𝐾ˆ and map it
to a general 𝑁-simplex 𝐾. Let 𝑆 𝐾 : 𝐾ˆ → 𝐾 be a one-to-one affine map that maps
𝐾ˆ onto a general tetrahedron 𝐾, and let [𝑆 ′𝐾 ] denote the Jacobian derivative matrix
of 𝑆 𝐾 . Given vector fields 𝑞 and 𝐸 on 𝐾, we use the following pullback to map
them to 𝐾:ˆ
Ψ(𝑞) = (det[𝑆 ′𝐾 ]) [𝑆 ′𝐾 ] −1 (𝑞 ◦ 𝑆 𝐾 ). (5.35)

It is easy to see that

div(Ψ(𝑞)) = (det[𝑆 ′𝐾 ]) (div 𝑞) ◦ 𝑆 𝐾 . (5.36)
Recall the Raviart–Thomas element 𝑅 𝑝 (𝐾), 𝑝 ≥ 1, previously considered in (5.11).
Theorem 5.8 (A Fortin operator on 𝐻(div, 𝐾)). On any 𝑁-simplex 𝐾, for any
integer 𝑘 ≥ 0, an operator 𝛱𝑘+1div : 𝐻(div, 𝐾) → 𝑅
𝑘+1 (𝐾) can be constructed such
that for all 𝜏 ∈ 𝐻(div, 𝐾), we have the commutativity property
div
div 𝛱𝑘+1 𝜏 = 𝛱𝑘 div 𝜏, (5.37)
the moment conditions
div
(𝛱𝑘+1 𝜏 − 𝜏, 𝑞)𝐾 = 0 for all 𝑞 ∈ 𝑃 𝑘−1 (𝐾) 𝑁 , (5.38a)
div
(𝑛 · (𝛱𝑘+1 𝜏 − 𝜏), 𝜇)𝜕𝐾 = 0 for all 𝜇 ∈ 𝑃 𝑘 (𝐾) (5.38b)
and the norm bounds
div
∥𝛱𝑘+1 𝜏∥ 𝐾 ≲ ∥𝜏∥ 𝐾 + ℎ 𝐾 ∥ div 𝜏∥ 𝐾 , (5.38c)
div
∥ div 𝛱𝑘+1 𝜏∥ 𝐾 ≤ ∥ div 𝜏∥ 𝐾 . (5.38d)
Proof. ˆ Let
We will first construct the operator on the reference unit 𝑁-simplex 𝐾.
𝑃⊥ 2
𝑘 (𝜕 𝐾) = {𝜇 ∈ 𝐿 (𝜕𝐾) : 𝜇| 𝐹 ∈ 𝑃 𝑘 (𝐹) for all 𝐹 ∈ △ 𝑁 −1 𝐾 and
ˆ
(𝜇, 𝑞)𝜕𝐾 = 0 for all 𝑞 ∈ 𝑃 𝑘 (𝐾)}.
It is the 𝐿 2 (𝜕 𝐾)-orthogonal
ˆ ˆ in the space of piecewise
complement of tr(𝑃 𝑘 (𝐾))
polynomials on 𝜕𝐾. Let
𝐵div ˆ ˆ ˆ 𝜕𝐾ˆ = 0 for all 𝑝ˆ⊥ ∈ 𝑃⊥
𝑘+1 (𝐾) = { 𝜏ˆ ∈ 𝑅 𝑘+1 (𝐾) : ( 𝑝ˆ⊥ , 𝜏ˆ · 𝑛)
ˆ
𝑘+1 (𝜕 𝐾)},

where 𝑛ˆ is the unit outward normal on 𝜕 𝐾ˆ and 𝑅 𝑘+1 (𝐾) ˆ is as in (5.11). We claim
that the equations
div

𝛱ˆ 𝑘+1 ˆ 𝑞ˆ 𝐾ˆ = (𝜏,
𝜏, ˆ 𝑞)
ˆ 𝐾ˆ ˆ 𝑁,
for all 𝑞ˆ ∈ 𝑃 𝑘−1 (𝐾) (5.39a)
div

𝛱ˆ 𝑘+1 𝜏ˆ · 𝑛,
ˆ 𝑤ˆ 𝜕𝐾ˆ = ⟨𝜏ˆ · 𝑛,
ˆ 𝑤⟩ˆ 𝐻 1/2 (𝜕𝐾) for all 𝑤ˆ ∈ 𝑃 𝑘 (𝐾)
ˆ (5.39b)
div 𝜏ˆ ∈ 𝐵div (𝐾)
uniquely determine 𝛱ˆ 𝑘+1 ˆ and thus define a linear continuous operator
𝑘+1
div ˆ → 𝐵div (𝐾).
𝛱ˆ 𝑘+1 : 𝐻(div, 𝐾) 𝑘+1
ˆ
ˆ ⊂
div 𝜏ˆ is in 𝐵div (𝐾)
Indeed, if the right-hand sides of (5.39) vanish, then since 𝛱ˆ 𝑘+1 𝑘+1
ˆ we find that 𝛱ˆ div is a function in the Raviart–Thomas space all of whose
𝑅 𝑘+1 (𝐾), 𝑘+1
canonical degrees of freedom vanish (see e.g. Arnold, Falk and Winther 2006 or
Nédélec 1980, Definition 5), so
div
𝛱ˆ 𝑘+1 𝜏ˆ = 0.
div 𝜏.
Since the system (5.39) is square, we conclude that it uniquely defines 𝛱ˆ 𝑘+1 ˆ

Next, we map this operator to a general simplex 𝐾 using the Piola transform Ψ
in (5.35):
𝛱 div = Ψ −1 ◦ 𝛱ˆ div ◦ Ψ.
𝑘+1 𝑘+1

By standard mapping arguments, the stated moment conditions of 𝛱𝑘+1div now follow

from (5.39). The moment conditions also imply that for any 𝜔 ∈ 𝑃 𝑘 (𝐾),
div
div
div

div 𝛱𝑘+1 𝜏 , 𝜔 𝐾 = − 𝛱𝑘+1 𝜏, grad 𝜔 𝐾 + 𝛱𝑘+1 𝜏 · 𝑛, 𝜔 𝜕𝐾
= −(𝜏, grad 𝜔)𝐾 + (𝜔, 𝜏 · 𝑛)𝜕𝐾
= (div 𝜏, 𝜔)𝐾 ,
div is a continuous
thus proving the commutativity property (5.37). Also, since 𝛱ˆ 𝑘+1
ˆ standard scaling arguments prove
operator on 𝐻(div, 𝐾),
div div
∥𝛱𝑘+1 𝜏∥ 𝐾 + ℎ 𝐾 ∥ div 𝛱𝑘+1 𝜏∥ 𝐾 ≲ ∥𝜏∥ 𝐾 + ℎ 𝐾 ∥ div 𝜏∥ 𝐾 .
Combined with the better bound on the divergence term,
div
∥ div 𝛱𝑘+1 𝜏∥ 𝐾 ≤ ∥ div 𝜏∥ 𝐾 ,
which obviously follows from (5.37), the stated norm bound is also proved.

Bibliographical notes. Theorem 5.8 and its proof are from Gopalakrishnan and
Qiu (2014). For a construction in 𝐻(div, 𝐾) in the same spirit as Theorem 5.7, see
Führer and Heuer (2024).

5.4. Commuting Fortin operators in three dimensions

grad
Having completed the previous discussions of a Fortin operator 𝛱𝑟 on the broken
𝐻 1 space, as well an 𝐻(div, 𝐾) Fortin operator 𝛱𝑘+1
div , we now proceed to show that

they are part of a family of local commuting Fortin operators. We restrict ourselves
to the three-dimensional (3D) 𝑁 = 3 case. Let
𝑁 𝑝 (𝐷) = 𝑃 𝑝−1 (𝐷)3 + 𝑥 × 𝑃 𝑝−1 (𝐷)3
denote the Nédélec element (Nédélec 1980). Together with the Raviart–Thomas
element 𝑅 𝑝+1 (𝐾) in (5.11), it forms the following well-known (see e.g. Arnold et al.
2006) exact complex:
grad curl div
0 𝑃 𝑝+1 (𝐾)/R 𝑁 𝑝+1 (𝐾) 𝑅 𝑝+1 (𝐾) 𝑃 𝑝 (𝐾) 0. (5.40)
We will prove the following result in this subsection. There 𝛱 𝑝 denotes the 𝐿 2 -
orthogonal projection onto 𝑃 𝑝 (𝐾). In order to verify condition (5.5) in various
DPG convergence analyses, the moment conditions (5.43)–(5.48) listed below are
helpful.

Theorem 5.9 (Commuting 3D Fortin operators satisfying moment conditions).

Let 𝑝 ≥ 0 be an integer. On any tetrahedron 𝐾, there are operators
grad
𝛱 𝑝+3 : 𝐻 1 (𝐾) → 𝑃 𝑝+3 (𝐾),
curl
𝛱 𝑝+3 : 𝐻(curl, 𝐾) → 𝑁 𝑝+3 (𝐾),
div
𝛱 𝑝+3 : 𝐻(div, 𝐾) → 𝑅 𝑝+3 (𝐾),

such that for any 𝑣 ∈ 𝐻 1 (𝐾), 𝐸 ∈ 𝐻(curl, 𝐾) and 𝜏 ∈ 𝐻(div, 𝐾), the norm estimates
grad
∥ 𝛱 𝑝+3 𝑣∥ 𝐻 1 (𝐾) ≲ ∥𝑣∥ 𝐻 1 (𝐾) , (5.41a)
curl
∥ 𝛱 𝑝+3 𝐸 ∥ 𝐻(curl,𝐾) ≲ ∥𝐸 ∥ 𝐻(curl,𝐾) , (5.41b)
div
∥ 𝛱 𝑝+3 𝜏∥ 𝐻(div,𝐾) ≲ ∥𝜏∥ 𝐻(div,𝐾) (5.41c)

hold, the diagram

grad curl div
𝐻 1 (𝐾)/R 𝐻(curl, 𝐾) 𝐻(div, 𝐾) 𝐿 2 (𝐾)
grad
𝛱 𝑝+3 curl
𝛱 𝑝+3 div
𝛱 𝑝+3 𝛱 𝑝+2 (5.42)
grad curl div
𝑃 𝑝+3 (𝐾)/R 𝑁 𝑝+3 (𝐾) 𝑅 𝑝+3 (𝐾) 𝑃 𝑝+2 (𝐾)

commutes, and the following identities hold:

grad
𝛱 𝑝+3 𝑣 − 𝑣, 𝑞 𝐾 = 0 for all 𝑞 ∈ 𝑃 𝑝−1 (𝐾), (5.43)
grad
𝛱 𝑝+3 𝑣 − 𝑣, 𝑛 · 𝜎 𝜕𝐾 = 0 for all 𝜎 ∈ 𝑅 𝑝+1 (𝐾), (5.44)
curl

𝛱 𝑝+3 𝐸 − 𝐸, 𝑣 𝐾 = 0 for all 𝑣 ∈ 𝑃 𝑝 (𝐾)3 , (5.45)
curl

𝑛 × 𝛱 𝑝+3 𝐸 − 𝐸 , 𝑤 𝜕𝐾 = 0 for all 𝑤 ∈ 𝑃 𝑝+1 (𝐾)3 , (5.46)
div

𝛱 𝑝+3 𝜏 − 𝜏, 𝑞 𝐾 = 0 for all 𝑞 ∈ 𝑃 𝑝+1 (𝐾)3 , (5.47)
div

𝑛 · 𝛱 𝑝+3 𝜏 − 𝜏 , 𝜇 𝜕𝐾 = 0 for all 𝜇 ∈ 𝑃 𝑝+2 (𝐾). (5.48)
grad
Before proceeding to prove this theorem, we note that the operator 𝛱 𝑝+3 stated
grad
in the theorem is the same as 𝛱𝑟 with 𝑟 = 𝑝 + 𝑁 constructed in Theorem 5.4,
restricted to 𝑁 = 3 dimensions (since (5.43)–(5.44) is the same as (5.28a)–(5.28b)).
However, we are yet to prove the relevant commutativity property. Note also that
the bound (5.41c) and the moment conditions (5.47) and (5.48) hold after putting
𝑁 = 3 and replacing 𝑘 with 𝑝 + 2 in Theorem 5.8. It also gives the commutativity
div stated in the last part of (5.42).
property of 𝛱 𝑝+3
curl . We will perform our construction on the reference
It remains to construct 𝛱 𝑝+3
unit 𝑁-simplex 𝐾ˆ and map it to a general 𝑁-simplex 𝐾. As before, let 𝑆 𝐾 : 𝐾ˆ → 𝐾
be an affine homeomorphism from 𝐾ˆ to a general tetrahedron 𝐾. Given a vector

ˆ
fields 𝐸 on 𝐾, we use the following pullback to map it to 𝐾:
Φ(𝐸) = [𝑆 ′𝐾 ] 𝑡 (𝐸 ◦ 𝑆 𝐾 ). (5.49)
It is easy to see that
curl(Φ(𝐸)) = Ψ(curl 𝐸). (5.50)
We begin with a preliminary lemma whose relevance will be clear soon. Let
𝑁˚ 𝑝 (𝐾) = {𝑞 ∈ 𝑁 𝑝 (𝐾) : 𝑛 × 𝑞| 𝜕𝐾 = 0}.
Lemma 5.10. For any 𝑝 ≥ 0, if 𝐹 ∈ 𝐻(curl, 𝐾) satisfies
(curl 𝐹, 𝑤)𝐾 = 0 for all 𝑤 ∈ 𝑁˚ 𝑝+1 (𝐾), (5.51)
then there is a 𝜙 ∈ 𝑃 𝑝+3 (𝐾) such that
(𝐹 + grad 𝜙, 𝑣)𝐾 = 0 for all 𝑣 ∈ 𝑃 𝑝 (𝐾)3 . (5.52)
Proof. Proving (5.52) amounts to proving that there is a 𝜙 ∈ 𝑃 𝑝+3 (𝐾) solving
𝐴𝜙 = 𝑏, (5.53)
where 𝐴 and 𝑏 are defined using the 𝐿 2 (𝐾)3 -orthogonal projection 𝛱 𝑝 into
𝑃 𝑝 (𝐾)3 by
𝐴 = 𝛱 𝑝 grad : 𝑃 𝑝+3 (𝐾) → 𝑃 𝑝 (𝐾)3 , 𝑏 = −𝛱 𝑝 𝐹.
In other words, it suffices to show that 𝑏 ∈ range(𝐴) = ker(𝐴∗ )⊥ , where 𝐴∗ is the
𝐿 2 -adjoint of 𝐴.
Any 𝑞 ∈ 𝑃 𝑝 (𝐾)3 is in ker(𝐴∗ ) if and only if
(𝑢, 𝐴∗ 𝑞) = (𝐴𝑢, 𝑞)𝐾 = (grad 𝑢, 𝑞)𝐾
= −(𝑢, div 𝑞)𝐾 + (𝑢, 𝑞 · 𝑛)𝜕𝐾 = 0 (5.54)
for all 𝑢 ∈ 𝑃 𝑝+3 (𝐾). Recall the bubble functions 𝑏 𝐾 and 𝑏 𝐹 from (5.26). Choosing
𝑢 = 𝑏 𝐾 div 𝑞 in (5.54), we find that div 𝑞 = 0. Then, removing the term containing
div 𝑞 and choosing 𝑢 = (𝑞 · 𝑛 𝐹 )𝑏 𝐹 in (5.54), we find that (𝑞 · 𝑛 𝐹 )| 𝐹 = 0, an argument
that can repeated on every facet 𝐹. Including the obvious converse as well, we have
proved that 𝑞 ∈ ker(𝐴∗ ) if and only if div 𝑞 = 0 and 𝑞 · 𝑛| 𝜕𝐾 = 0, that is,
ker(𝐴∗ ) = curl 𝑁˚ 𝑝+1 (𝐾).
Note that for any 𝑤 ∈ 𝑁˚ 𝑝+1 (𝐾), by the given condition (5.51),
(𝑏, curl 𝑤) = −(𝐹, curl 𝑤) = −(curl 𝐹, 𝑤) = 0,
so 𝑏 ∈ ker(𝐴∗ )⊥ = range(𝐴) and (5.53) has a solution.
curl , we need an intermediate operator 𝛱
Before constructing the required 𝛱 𝑝+3 ˆ𝑐
𝑝+3
ˆ Let
on a reference unit tetrahedron 𝐾.
𝐷 𝑝+2 (𝐾) ˆ = {𝑟 ∈ 𝑅 𝑝+3 (𝐾)
ˆ = curl 𝑁 𝑝+3 (𝐾) ˆ : div 𝑟 = 0}

and
ˆ → 𝐷 𝑝+2 (𝐾).
𝐶 = curl : 𝑁 𝑝+3 (𝐾) ˆ
Using 𝛱ˆ 𝑝 , the 𝐿 2 (𝐾) ˆ 3 , define 𝐵 = 𝛱ˆ 𝑝 : 𝑁 𝑝+3 →
ˆ 3 -orthogonal projection into 𝑃 𝑝 (𝐾)
𝑃 𝑝 (𝐾) . The codomains of 𝐵 and 𝐶 are endowed with the 𝐿 2 -norm, which then
ˆ 3

naturally define their 𝐿 2 -adjoints 𝐵∗ and 𝐶 ∗ . Note that one of the commutativity
properties in (5.42) and one of the moment conditions (5.45) read, respectively, as
follows:
div
𝐶𝐹 = 𝛱 𝑝+3 curl 𝐸, 𝐵𝐹 = 𝛱ˆ 𝑝 𝐸, (5.55)
curl 𝐸. Accordingly, we seek the result of the application of the Fortin
with 𝐹 = 𝛱 𝑝+3
operator in the set
𝑆(𝐸) = {𝐹 ∈ 𝑁 𝑝+3 (𝐾)
ˆ : 𝐹 satisfies (5.55)}. (5.56)
𝑐 , consider the problem of finding
For defining the intermediate operator 𝛱ˆ 𝑝+3
𝛱 𝑝+3 𝐸 ∈ 𝑁 𝑝+3 (𝐾), 𝜆 ∈ 𝑃 𝑝 (𝐾) and 𝜇 ∈ 𝐷 𝑝+2 (𝐾)
ˆ 𝑐 ˆ ˆ 3 ˆ satisfying
𝑐
𝛱ˆ 𝑝+3 𝐸 + 𝐵∗ 𝜆 + 𝐶 ∗ 𝜇 = 0, (5.57a)
𝑐
𝐵 𝛱ˆ 𝑝+3 𝐸 = 𝛱ˆ 𝑝 𝐸, (5.57b)
𝐶 𝛱ˆ 𝑐 𝐸
𝑝+3 = 𝛱ˆ div curl 𝐸,
𝑝+3 (5.57c)
div is as defined in (5.39). One may view 𝜆 and 𝜇 above as Lagrange
where 𝛱ˆ 𝑝+3
multipliers for the constrained minimization problem
𝑐
𝛱ˆ 𝑝+3 𝐸 = arg min ∥𝐹 ∥ 2𝐾ˆ , (5.58)
𝐹 ∈𝑆(𝐸)

where the minimization is over the affine set in (5.56).

We claim that there exists a unique 𝛱ˆ 𝑝+3𝑐 𝐸 ∈ 𝑁 ˆ satisfying (5.57). First
𝑝+3 (𝐾)
div
observe that (5.37) implies div 𝛱ˆ 𝑝+3 curl 𝐸 = 0, that is, by the exactness of (5.40),
there is a 𝐸ˆ 𝑝+3 ∈ 𝑁 𝑝+3 (𝐾) such that curl 𝐸ˆ 𝑝+3 = 𝛱ˆ 𝑝+3 div curl 𝐸. By the moment

𝑝+3 −𝐸 satisfies (curl 𝐹, 𝑤) = 0 for all 𝑤 ∈ 𝑃 𝑝+1 (𝐾),

div , 𝐹 = 𝐸ˆ
condition (5.47) of 𝛱 𝑝+3
and in particular, the condition (5.51) of Lemma 5.10. The lemma then gives the
existence of 𝜙 ∈ 𝑃 𝑝+3 (𝐾) such that 𝐺 = 𝐸ˆ 𝑝+3 + grad 𝜙 ∈ 𝑁 𝑝+3 (𝐾) ˆ satisfies

𝐵 𝛱ˆ 𝑝 𝐸
𝐺 = ˆ div ,
𝐶 𝛱 𝑝+3 curl 𝐸

that is, the right-hand side of (5.57) is in the range of [ 𝐶𝐵 ] (or equivalently 𝑆(𝐸)
is a non-empty feasible set of constraints). Hence, by standard arguments (see
Brezzi and Fortin 1991, Proposition 1.1, p. 38), there exists a solution to (5.57) and
𝑐 𝐸 component of the solution is unique. The linearity of 𝛱
moreover the 𝛱ˆ 𝑝+3 ˆ𝑐 𝐸
𝑝+3
with respect to the right-hand sides of (5.57) is obvious and the right-hand sides
in turn depend linearly on 𝐸. Furthermore, since the ranges of 𝐵 and 𝐶 are closed
finite-dimensional spaces, there is a 𝑐 𝐾ˆ > 0 such that (see Brezzi and Fortin 1991,

Proposition 1.2, p. 39) the linear operator 𝛱ˆ 𝑝+3 ˆ → 𝑁 𝑝+3 (𝐾)

𝑐 : 𝐻(curl, 𝐾) ˆ satisfies
𝑐
∥ 𝛱ˆ 𝑝+3 ˆ ≤ 𝑐 𝐾ˆ ∥𝐸 ∥ 𝐻(curl, 𝐾)
𝐸 ∥ 𝐻(curl, 𝐾) ˆ . (5.59)
𝑐
Next we map 𝛱ˆ 𝑝+3 from 𝐾ˆ to an operator on any shape-regular tetrahedron 𝐾
using the covariant pullback Φ in (5.50):
𝑐
𝛱 𝑝+3 = Φ−1 ◦ 𝛱ˆ 𝑝+3
𝑐
◦ Φ.
Lemma 5.11. 𝑐 : 𝐻(curl, 𝐾) → 𝑁
On any tetrahedron 𝐾, the operator 𝛱 𝑝+3 𝑝+3 (𝐾)
satisfies
𝑐 div
curl 𝛱 𝑝+3 𝐸 = 𝛱 𝑝+3 curl 𝐸 for all 𝐸 ∈ 𝐻(curl, 𝐾), (5.60)
𝑐
3
𝛱 𝑝+3 𝐸 − 𝐸, 𝑣 𝐾 = 0 for all 𝑣 ∈ 𝑃 𝑝 (𝐾) , (5.61)
𝑐
3
𝑛 × 𝛱 𝑝+3 𝐸 − 𝐸 , 𝑤 𝜕𝐾 = 0 for all 𝑤 ∈ 𝑃 𝑝+1 (𝐾) , (5.62)
𝑐
∥ 𝛱 𝑝+3 𝐸 ∥ 𝐻(curl,𝐾) ≲ ∥𝐸 ∥ 𝐻(curl,𝐾) . (5.63)

Proof. Mapping the equation (5.57b) from 𝐾ˆ to 𝐾, we obtain (5.61). It is easy

to see that curl(Φ(𝐸)) = Ψ(curl 𝐸) for any 𝐸 ∈ 𝐻(curl, 𝐾). Using it, we note that
mapping (5.57c) from 𝐾ˆ to 𝐾 we obtain the commutativity property (5.60) on 𝐾.
To prove the extra boundary moment condition (5.62), we substitute 𝑣 = curl 𝑤 for
some 𝑤 ∈ 𝑃 𝑝+1 (𝐾)3 into (5.61) and integrate by parts to get
𝑐

0 = 𝛱 𝑝+3 𝐸 − 𝐸, curl 𝑤 𝐾
𝑐
𝑐

= − 𝑛 × 𝛱 𝑝+3 𝐸 − 𝐸 , 𝑤 𝐾 + curl 𝛱 𝑝+3 𝐸 − 𝐸 , 𝑤 𝐾,
div .
and the last term vanishes by (5.60) and the moment condition (5.47) of 𝛱 𝑝+3
To prove (5.63), note that the bound (5.59) and standard scaling arguments
(detailed in Carstensen et al. 2016, eq. (52)) imply
𝑐
∥ 𝛱 𝑝+3 𝐸 ∥ 2𝐾 ≲ ∥𝐸 ∥ 2𝐾 + ℎ2𝐾 ∥ curl 𝐸 ∥ 2𝐾 .
Additionally, since (5.57c) and (5.41b) imply that ∥ curl 𝛱 𝑝+3
𝑐 𝐸 ∥ ≲ ∥ curl 𝐸 ∥ ,
𝐾 𝐾
the estimate (5.63) follows.
Recall that any 𝐸 ∈ 𝐻(curl, 𝐾) admits the unique orthogonal Helmholtz decom-
position
𝐸 = 𝐸˜ + grad 𝜓, (5.64)
∫
where 𝜓 ∈ 𝐻 1 (𝐾) has zero mean value 𝜓¯ = |𝐾 | −1 𝐾 𝜓 = 0, and 𝐸˜ ∈ 𝐻(curl, 𝐾) is
˜ grad 𝜑)𝐾 = 0 for all 𝜑 ∈ 𝐻 1 (𝐾). Using the Helmholtz decomposition
such that (𝐸,
(5.64) of 𝐸, define
grad
𝛱 curl 𝐸 = 𝛱 𝑐 𝐸˜ + grad 𝛱
𝑝+3 𝑝+3 𝜓. (5.65)
𝑝+3

We proceed to prove that this operator has all the required properties.

Proof of Theorem 5.9. We have already shown that the 𝑁-dimensional operator
div in Theorem 5.8, restricted to 𝑁 = 3 case, satisfies all the properties stated
𝛱 𝑝+3
in the theorem. We have also shown that the operator in Theorem 5.4, restricted
grad
to 𝑁 = 3, satisfies all the stated properties of 𝛱 𝑝+3 except for its commutativity
curl
property involving 𝛱 𝑝+3 . Hence, to finish the proof, we now proceed to prove
curl 𝐸 defined in (5.65) satisfies the norm bound (5.41b), the moment
that the 𝛱 𝑝+3
conditions (5.45)–(5.46), as well as the commutativity properties
curl div
curl 𝛱 𝑝+3 𝐸 = 𝛱 𝑝+3 curl 𝐸, (5.66)
curl grad
𝛱 𝑝+3 grad 𝜙 = grad 𝛱 𝑝+3 𝜙 (5.67)

for all 𝜙 ∈ 𝐻 1 (𝐾) and 𝐸 ∈ 𝐻(curl, 𝐾).

curl in (5.65), the norm bound
First, we use, in succession, the definition of 𝛱 𝑝+3
𝑐 grad
(5.63) of the intermediate operator 𝛱 𝑝+3 and that of 𝛱 𝑝+3 in (5.41a), the ortho-
gonality of the Helmholtz decomposition which implies
∥ 𝐸˜ ∥ 2𝐾 + ∥ grad 𝜓∥ 2𝐾 = ∥𝐸 ∥ 2𝐾 ,
and the Poincaré inequality ∥𝜓∥ 𝐾 ≲ ℎ 𝐾 ∥ grad 𝜓∥ 2𝐾 , to get
curl 𝑐 grad
∥ 𝛱 𝑝+3 𝐸 ∥ 𝐻(curl,𝐾) ≤ ∥ 𝛱 𝑝+3 𝐸˜ ∥ 𝐻(curl,𝐾) + ∥ grad 𝛱 𝑝+3 𝜓∥ 𝐾
≲ ∥ 𝐸˜ ∥ 𝐻(curl,𝐾) + ∥𝜓∥ 𝐻 1 (𝐾) ≲ ∥𝐸 ∥ 𝐻(curl,𝐾) ,
thus proving the required bound (5.41b).
To prove the interior moment condition (5.45), we again start by applying the
definition (5.65):

curl
𝛱 𝑝+3 𝑐
𝐸 − 𝐸, 𝑣 𝐾 = 𝛱 𝑝+3 ˜ 𝑣 + grad 𝛱 grad 𝜓 − 𝜓 , 𝑣 .
𝐸˜ − 𝐸, 𝐾 𝑝+3 𝐾
𝑐 𝐸˜ satisfies the
Note that the first term on the right-hand side vanishes since 𝛱 𝑝+3
moment condition (5.61) of Lemma 5.11. The last term vanishes after integrating
grad
by parts and using the moment conditions (5.43)–(5.44) of 𝛱 𝑝+3 .
Next, to prove the element boundary moment condition (5.46), starting with
curl
𝑐

𝑛 × 𝛱 𝑝+3 𝐸 − 𝐸 , 𝑤 𝜕𝐾 = 𝑛 × 𝛱 𝑝+3 𝐸˜ − 𝐸˜ , 𝑤 𝜕𝐾
grad
+ 𝑛 × grad 𝛱 𝑝+3 𝜓 − 𝜓 , 𝑤 𝜕𝐾 , (5.68)
note that the first term on the right-hand side vanishes due to the moment condition
𝑐 𝐸.
(5.62) of 𝛱 𝑝+3 ˜ To see that the last term also vanishes, letting 𝑒 = 𝛱 grad 𝜓 − 𝜓,
𝑝+3
observe that the equalities
(𝑒, div 𝑞)𝐾 = 0 = −(𝑞, grad 𝑒)𝐾
can be seen to hold for any 𝑞 ∈ 𝑃 𝑝 (𝐾)3 due to the moment conditions (5.43)–(5.44)
grad
of 𝛱 𝑝+3 and integration by parts. Putting 𝑞 = curl 𝑤 for any 𝑤 ∈ 𝑃 𝑝+1 (𝐾)3 , the

last equality implies that

0 = (curl 𝑤, grad 𝑒)𝐾 = −(𝑛 × grad 𝑒, 𝑤)𝜕𝐾 ,
which shows that the last term in (5.68) vanishes.
The proof of the commutativity property (5.66) is straightforward:
curl 𝑐
curl 𝛱 𝑝+3 𝐸 = curl 𝛱 𝑝+3 𝐸˜ by (5.65)
div
= 𝛱 𝑝+3 curl 𝐸˜ by (5.60)
div
= 𝛱 𝑝+3 curl 𝐸 by (5.64).
Finally, to prove the remaining commutativity property (5.67), observe that the
Helmholtz decomposition (5.64) of 𝐸 = grad 𝜙 gives a vanishing 𝐸-component
˜
and a 𝜓-component that equals 𝜙 − 𝜙¯ for any 𝜙 ∈ 𝐻 (𝐾). Hence
1

curl grad ¯ = grad 𝛱 grad 𝜙 − grad 𝜙¯ = grad 𝛱 grad 𝜙,

𝛱 𝑝+3 grad 𝜙 = grad 𝛱 𝑝+3 (𝜙 − 𝜙) 𝑝+3 𝑝+3

where we have used (5.13d).

Example 5.12 (Maxwell equations). We continue Example 4.5, where the infin-
ite-dimensional spaces and forms for the primal DPG formulation of the Maxwell
cavity problem were set by (4.32), and the formulation was proved to be wellposed.
Now we focus on its discretization using subspaces 𝑋0,ℎ ⊂ 𝑋ℎ , 𝑋ˆ ℎ ⊂ 𝑋ˆ and 𝑌ℎ ⊂ 𝑌
set by
𝑋0,ℎ = {𝐸 ℎ ∈ 𝐻(curl,
˚ 𝛺) : 𝐸 ℎ | 𝐾 ∈ 𝑃 𝑝 (𝐾)3 for all 𝐾 ∈ 𝛺 ℎ }, (5.69a)
𝑋ˆ ℎ = {𝑛 × 𝐻ˆ ℎ ∈ 𝐻 −1/2 (divF , 𝜕𝛺 ℎ ) : 𝑛 × 𝐻ˆ ℎ | 𝜕𝐾 ∈ tr𝑛×
𝐾
𝑃 𝑝+1 (𝐾)3 for all 𝐾 ∈ 𝛺 ℎ },
𝑌ℎ = {𝐹ℎ ∈ 𝐻(curl, 𝛺 ℎ ) : 𝐹ℎ | 𝐾 ∈ 𝑁 𝑝+3 (𝐾) for all 𝐾 ∈ 𝛺 ℎ }. (5.69b)
To obtain error estimates, we apply Theorem 5.2, under the additional assumption
that the material coefficients 𝜇, 𝜀 are constant on each mesh element 𝐾 ∈ 𝛺 ℎ . Then
(5.41b), (5.45) and (5.46) of Theorem 5.9 verify condition (5.5) with 𝛱 = 𝛱 𝑝+3 curl .

Hence we conclude that

∥𝐸 − 𝐸 ℎ ∥ 2𝐻(curl,𝛺) + ∥𝑛 × (𝐻ˆ − 𝐻ˆ ℎ )∥ 2𝐻 −1/2 (div ,𝜕𝛺 )
F ℎ
h i
≲ inf ∥𝐸 − 𝐺 ℎ ∥ 𝐻(curl,𝛺) + ∥𝑛 × 𝐻ˆ − 𝑛 × 𝑅ˆ ℎ ∥ 2𝐻 −1/2 (div
2
.
F ,𝜕𝛺ℎ )
𝐺ℎ ∈𝑋ℎ,0 , 𝑛× 𝑅ˆ ℎ ∈ 𝑋ˆ ℎ

Thus the method is quasioptimal. Convergence rates can be derived by bounding

the right-hand side (as illustrated in Example 5.5). Curiously, unlike the standard
finite element method for the cavity problem, for the DPG method there appears
to be no need for ℎ to be sufficiently small to obtain quasioptimality. However,
the discrete stability of the DPG method, inherited from the wellposedness, can
deteriorate when the exact inf-sup constant 𝛾 is poor (which is to be expected as 𝜔
approaches a cavity resonance).

Bibliographical notes. The construction of the 𝐻(curl, 𝐾) Fortin operator for DPG
methods presented here is new, but is related to existing constructions. The first
𝐻(curl, 𝐾) Fortin operator was given in Carstensen et al. (2016). The construction
there uses an appropriate bubble space and is similar in spirit to our constructions
grad div in Theorem 5.8. Another natural method for
of 𝛱 𝑝+3 in Theorem 5.4 and 𝛱 𝑝+3
curl is through the constrained minimization (5.58), where the
construction of 𝛱 𝑝+3
required moment conditions are put as constraints. Such minimizations were used
to construct Fortin operators in Demkowicz (2024) and Demkowicz and Zanotti
(2020). The construction we have presented here is close but not identical to these,
because in proving Theorem 5.9 we needed to establish commutativity between
differently constructed Fortin operators. These techniques clearly show there are
multiple avenues to construct Fortin operators for DPG schemes.

6. A posteriori error control

The DPG method comes with a built-in error estimator. The estimator naturally
appears either from a residual minimization standpoint or through a characterization
of the method as a mixed method, as revealed in this section. The estimator can
be thought of as a hierarchical type error estimator obtained by exploiting test
functions that do not contribute to the inexact optimal test space.

6.1. Discrete residual minimization, error estimators, and mixed formulation

Let 𝑥 be as in (1.1). Consider the DPG method (5.4) for approximating 𝑥, obtained
using some finite-dimensional spaces 𝑋ℎ and 𝑌 𝑟 . Recall that following prior
notation, 𝑅𝑌 𝑟 : 𝑌 𝑟 → (𝑌 𝑟 )∗ denotes the Riesz map defined by (𝑅𝑌 𝑟 𝑦)(𝑣) = (𝑦, 𝑣)𝑌
for all 𝑦 and 𝑣 in 𝑌 𝑟 . From the definition of the computable trial-to-test operator
𝑇 𝑟 in (5.2), it is easy to see that
𝑇 𝑟 𝑤 ℎ = 𝑅𝑌−1𝑟 𝐵𝑤 ℎ , 𝑤 ℎ ∈ 𝑋ℎ . (6.1)
Note that 𝑅𝑌−1𝑟 can be applied to 𝐵𝑤 ℎ since it is in 𝑌 ∗ ⊂ (𝑌 𝑟 )∗ . For any 𝑥, 𝑤 ∈ 𝑋,
let
(𝑧, 𝑤)𝑟 = (𝑇 𝑟 𝑧, 𝑇 𝑟 𝑤)𝑌 , |𝑧| 𝑟 = ∥𝑇 𝑟 𝑠∥𝑌 . (6.2)
By (5.7), 𝑇 𝑟 is injective on 𝑋ℎ when a Fortin operator exists, so | · | 𝑟 is a norm
on 𝑋ℎ . In general, | · | 𝑟 is only a seminorm on 𝑋. Even so, whenever | · | 𝑟 is a
norm on the finite-dimensional space 𝑋ℎ , it is easy to see that there exists a unique
minimizer 𝑥 ℎ in 𝑋ℎ solving
𝑥 ℎ = arg min |𝑥 − 𝑧 ℎ | 𝑟 , (6.3a)
𝑧ℎ ∈𝑋ℎ

which is characterized by
(𝑥 − 𝑥 ℎ , 𝑧 ℎ )𝑟 = 0 for all 𝑧 ℎ ∈ 𝑋ℎ . (6.3b)

This minimizer also minimizes a residual in a discrete dual norm and equals the
solution of the (practical) DPG method, as stated next.
Theorem 6.1 (Inexact residual minimization). Under the assumptions of The-
orem 5.2, the following are equivalent statements.
(a) 𝑥 ℎ ∈ 𝑋ℎ is the unique solution of the DPG method (5.4).
(b) 𝑥 ℎ is the unique element of 𝑋ℎ satisfying
|𝑥 − 𝑥 ℎ | 𝑟 = inf |𝑥 − 𝑧 ℎ | 𝑟 .
𝑧ℎ ∈𝑋ℎ

(c) 𝑥 ℎ minimizes the residual in the following sense:

𝑥 ℎ = arg min ∥ℓ − 𝐵𝑧 ℎ ∥ (𝑌 𝑟 )∗ .
𝑧ℎ ∈𝑋ℎ

Proof. Follow along the lines of proof of Theorem 3.2 but using (6.1) instead of
(3.2) and noting (6.3).
Definition 6.2. Let ℓ be as in (1.1), let 𝑥˜ ℎ be any element of 𝑋ℎ , and let 𝑌 𝑟 be as
in (5.1). The element of 𝑌 𝑟 defined by
𝜀˜𝑟 = 𝑅𝑌−1𝑟 (ℓ − 𝐵𝑥˜ ℎ ) (6.4)
is called the inexact error representation of 𝑥˜ ℎ (see Definition 3.3). When 𝑥˜ ℎ is set
to the solution 𝑥 ℎ of the DPG method (5.4), then its inexact error representation is
denoted (omitting the tilde) by 𝜀𝑟 = 𝑅𝑌−1𝑟 (ℓ − 𝐵𝑥 ℎ ).
It is easy to see that 𝜀˜𝑟 in (6.4) is the unique element of 𝑌 𝑟 satisfying
(𝜀˜𝑟 , 𝑦)𝑌 = ℓ(𝑦) − 𝑏(𝑥˜ ℎ , 𝑦) for all 𝑦 ∈ 𝑌 𝑟 . (6.5)
This shows that the inexact error representation of the DPG solution, namely 𝜀𝑟 , is
𝑌 -orthogonal to the entire inexact optimal test space 𝑌ℎ𝑟 due to (5.4). Let
𝜂˜ = ∥ 𝜀˜𝑟 ∥𝑌 , 𝜂 = ∥𝜀𝑟 ∥𝑌 . (6.6)
Clearly, (6.1), (6.5) and (6.2) imply
𝜂˜ = ∥𝑅𝑌−1𝑟 𝐵(𝑥 − 𝑥˜ ℎ )∥𝑌 = ∥𝑇 𝑟 (𝑥 − 𝑥˜ ℎ )∥𝑌 .
When 𝑌 𝑟 is of the product form (5.1), the norm in (6.6) can be written in terms of
local element contributions, each of which acts as a practically computable element-
wise error indicator. It is useful to note the following analogue of Theorems 3.4.
Theorem 6.3 (Inexact error representation as a mixed solution component).
Let 𝜀𝑟 denote the inexact error representation of Definition 6.2. Then the following
are equivalent statements.
(a) 𝑥 ℎ ∈ 𝑋ℎ solves the DPG method (5.4).

(b) 𝑥 ℎ ∈ 𝑋ℎ and 𝜀𝑟 ∈ 𝑌 𝑟 solve the mixed formulation

(𝜀𝑟 , 𝑦)𝑌 + 𝑏(𝑥 ℎ , 𝑦) = ℓ(𝑦) for all 𝑦 ∈ 𝑌 𝑟 , (6.7a)
𝑏(𝑧, 𝜀𝑟 ) = 0 for all 𝑧 ∈ 𝑋ℎ . (6.7b)

(c) 𝜀𝑟 and 𝑥 ℎ form the saddle point of

1
𝐿(𝑦, 𝑧) = ∥𝑦∥𝑌2 − ℓ(𝑦) + 𝑏(𝑧, 𝑦)
2
on 𝑌 𝑟 × 𝑋ℎ , that is,
𝐿(𝜀, 𝑥 ℎ ) = min𝑟 max 𝐿(𝑦, 𝑧).
𝑦 ∈𝑌 𝑧 ∈𝑋ℎ

Proof. Follow along the lines of the proof of Theorem 3.4.

The mixed reformulation (6.7) of the DPG method in Theorem 6.3 gives further
insight into the stability of the method. In a typical two-equation mixed system,
enlarging the test space in the first equation, while often helpful to prove the inf-
sup condition by increasing the supremum, is fraught with the danger of losing
the coercivity of the first term. However, in the DPG system (6.7), the first term
(·, ·)𝑌 , being an inner product, will never lose coercivity, no matter how liberally
we enrich 𝑌 𝑟 . This explains any perceived ease in proving stability of DPG
formulations.

6.2. Reliability and efficiency

The basis for a posteriori error control in DPG methods using 𝜂 is the following
result, proved under the same prior assumption on the existence of a continuous
Fortin operator 𝛱 .

Theorem 6.4 (Global reliability and efficiency for any approximation). Under
the assumptions of Theorem 5.2, we have the following inequalities for the differ-
ence between the exact solution 𝑥 and any 𝑥˜ ℎ ∈ 𝑋ℎ in terms of the corresponding
computable error estimator 𝜂˜ of (6.6):
𝛾∥𝑥 − 𝑥˜ ℎ ∥ 𝑋 ≤ ∥𝛱 ∥ 𝜂˜ + osc(ℓ) (reliability), (6.8a)
𝜂˜ ≤ ∥𝑏∥ ∥𝑥 − 𝑥˜ ℎ ∥ 𝑋 (efficiency). (6.8b)
Here
osc(ℓ) = ∥ℓ ◦ (1 − 𝛱 )∥𝑌 ∗ (6.8c)
represents a term akin to data-approximation error and it admits the following
bound:
osc(ℓ) ≤ ∥𝑏∥ ∥1 − 𝛱 ∥ min ∥𝑥 − 𝑧 ℎ ∥ 𝑋 . (6.8d)
𝑧ℎ ∈𝑋ℎ

Proof. To prove (6.8a), observe that

𝑏(𝑥 − 𝑥˜ ℎ , 𝑦) = ℓ(𝑦) − 𝑏(𝑥˜ ℎ , 𝑦)
= ℓ(𝑦 − 𝛱 𝑦) − 𝑏(𝑥˜ ℎ , 𝑦 − 𝛱 𝑦) + ℓ(𝛱 𝑦) − 𝑏(𝑥˜ ℎ , 𝛱 𝑦)
= ℓ(𝑦 − 𝛱 𝑦) + (𝜀˜𝑟 , 𝛱 𝑦)𝑌
due to (5.5) and (6.5). Hence (1.2) implies
|𝑏(𝑥 − 𝑥˜ ℎ , 𝑦)| |ℓ ◦ (1 − 𝛱 )(𝑦) + (𝜀˜𝑟 , 𝛱 𝑦)𝑌 |
𝛾∥𝑥 − 𝑥˜ ℎ ∥ 𝑋 ≤ sup = sup ,
0≠𝑦 ∈𝑌 ∥𝑦∥𝑌 0≠𝑦 ∈𝑌 ∥𝑦∥𝑌
from which (6.8a) follows.
The global efficiency estimate is immediate from (6.5):
𝜂˜2 = 𝑏(𝑥 − 𝑥˜ ℎ , 𝜀˜𝑟 ) ≤ ∥𝑏∥ ∥𝑥 − 𝑥˜ ℎ ∥ 𝑋 𝜂.
˜
Finally, to prove (6.8d),
(ℓ ◦ (1 − 𝛱 )(𝑦) = ℓ(𝑦) − ℓ(𝛱 𝑦) = 𝑏(𝑥, 𝑦 − 𝛱 𝑦)
= 𝑏(𝑥 − 𝑧 ℎ , 𝑦 − 𝛱 𝑦) ≤ ∥𝑏∥ ∥𝑥 − 𝑧 ℎ ∥ 𝑋 ∥𝑦 − 𝛱 𝑦∥𝑌
for any 𝑧 ℎ ∈ 𝑋ℎ , where we used (5.5) to get the last equality. Taking the infimum
over 𝑧 ℎ ∈ 𝑋ℎ and supremum over 0 ≠ 𝑦 ∈ 𝑌 , we obtain (6.8d).
opt
Example 6.5 (Case of 𝑌 𝑟 ⊇ 𝑌ℎ ). Reconsider the setting of Example 5.3 for
opt
some 𝑌 𝑟 ⊇ 𝑌ℎ . As shown there, the solution 𝑥 ℎ ∈ 𝑋ℎ of the IPG method
and the practical DPG method are identical. We also showed there that the Fortin
condition holds with 𝛱 set to the 𝑌 -orthogonal projection of 𝛱𝑌 𝑟 into 𝑌 𝑟 . Hence
Theorem 6.4 applies with ∥𝛱 ∥ = ∥𝛱𝑌 𝑟 ∥ = 1, so
𝛾∥𝑥 − 𝑥 ℎ ∥ 𝑋 ≤ 𝜂 + osc(ℓ), 𝜂 ≤ ∥𝑏∥ ∥𝑥 − 𝑥 ℎ ∥ 𝑋 . (6.9)
Here
osc(ℓ) = ∥ℓ ◦ (𝐼 − 𝛱𝑌 𝑟 )∥𝑌 ∗ (6.10)
following its definition in (6.8c).
It is interesting to compare the exact and inexact error representations, 𝜀 ∈ 𝑌 and
𝜀𝑟 ∈ 𝑌 𝑟 respectively, in this example. Consider the mixed method reformulations
of the IPG and DPG methods, namely (3.4) and (6.7) respectively. Since 𝑥 ℎ is the
same in both cases, choosing 𝑦 of (3.4a) in 𝑌 𝑟 and subtracting (6.7a), we find that
(𝜀 − 𝜀𝑟 , 𝑦)𝑌 = 0 for all 𝑦 ∈ 𝑌 𝑟 , (6.11)
that is, 𝜀𝑟 = 𝛱 𝜀 is the 𝑌 -orthogonal projection of the exact error representation.
We claim that
∥𝜀𝑟 ∥𝑌 ≤ ∥𝜀∥𝑌 ≤ ∥𝜀𝑟 ∥𝑌 + osc(ℓ)2 . (6.12)
The first inequality above is immediate from 𝜀𝑟 = 𝛱 𝜀. The second inequality

follows from the Pythagoras theorem ∥𝜀∥𝑌2 = ∥𝜀𝑟 ∥𝑌2 + ∥𝜀 − 𝜀𝑟 ∥𝑌2 and
∥𝜀 − 𝜀𝑟 ∥𝑌2 = (𝜀, 𝜀 − 𝜀𝑟 )𝑌 by (6.12)
= ℓ(𝜀 − 𝜀𝑟 ) − 𝑏(𝑥 ℎ , 𝜀 − 𝜀𝑟 ) by (3.4a)
= ℓ(𝜀 − 𝜀𝑟 ) by (6.7b) and (3.4b).
Clearly
ℓ(𝜀 − 𝜀𝑟 ) = ℓ(𝜀 − 𝛱 𝜀) ≤ ∥ℓ ◦ (1 − 𝛱 )∥𝑌 ∗ ∥𝜀 − 𝜀𝑟 ∥𝑌 = osc(ℓ)∥𝜀 − 𝜀𝑟 ∥𝑌 ,
and the upper inequality of (6.12) follows.
Bibliographical notes. The proof given for Theorem 6.4 is a slightly simpli-
fied version of the original presented in Carstensen, Demkowicz and Gopala-
krishnan (2014a, Theorem 2.1), and produces an improved reliability constant as
in Carstensen, Gallistl, Hellwig and Weggler (2014b, Lemma 3.6). If in addition
𝛱 is idempotent, then (6.8a) can be further improved to
√︁ 2
𝛾 2 ∥𝑥 − 𝑥 ℎ ∥ 2𝑋 ≤ 𝜂2 + 𝜂 ∥𝛱 ∥ 2 − 1 + osc(ℓ) , (6.13)
as shown in Keith, Astaneh and Demkowicz (2019, Theorem 6.4). Note the
relationship between (6.12) and (6.13) when 𝛱 is an orthogonal projection. It
is easy to construct adaptive algorithms with marking strategies based on the
DPG error estimator following the usual ‘Solve → Estimate → Mark → Refine’
paradigm. In all reports of practical performance (Demkowicz et al. 2012a, Petrides
and Demkowicz 2017), such DPG algorithms work very well, but to the best of our
knowledge their convergence and optimality are yet to be rigorously proved.

7. Ultraweak formulations
A rich set of examples to apply the DPG ideas is offered by the so-called ‘ultraweak’
formulations of boundary value problems seeking 𝑢 ∈ 𝑉 satisfying 𝐴𝑢 = 𝑓 for
an 𝑓 ∈ 𝐿 2 (𝛺)𝑚 . Here 𝑉 is a space where homogeneous boundary conditions
are imposed, and 𝐴 is a general partial differential operator (specified below). In
ultraweak formulations all derivatives in 𝐴 are moved to test functions by integration
by parts, element by element. In order to use the previously developed DPG ideas, it
is important to obtain a reformulation where the trial-to-test operator 𝑇 is localized,
i.e. a formulation where the test space has the form (4.1). Such a reformulation,
set in a broken graph space, is derived and studied in this section. We prove its
wellposedness by general arguments that cover many examples at once. The first
main result (Theorem 7.6) of this section identifies conditions under which the
wellposedness of ultraweak formulations in broken graph spaces can be obtained
as soon as 𝐴 : 𝑉 → 𝐿 2 (𝛺)𝑚 is a bijection, no matter how complex the spaces of
interface variables are. Another result of this section (Theorem 7.9), which has not
appeared in previous literature, exhibits norms in which the best possible stability
of ultraweak formulations can be obtained.

Let 𝑘, 𝑚, 𝑁 ≥ 1 be integers, let 𝛺 ⊆ R 𝑁 and 𝛺 Íℎ𝑚 be as in Section 4, let 𝑒 𝑖

denote the standard Euclidean unit basis in R 𝑁 , 𝑤 = 𝑖=1 𝑤 𝑖 𝑒 𝑖 : 𝛺 → C𝑚 for some
smooth functions 𝑤 𝑖 , and let 𝐴 be the partial differential operator
𝑚
∑︁ ∑︁
𝐴𝑤 = 𝑒 𝑖 𝜕 𝛼 (𝑎 𝑖 𝑗 𝛼 𝑤 𝑗 ) (7.1)
𝑖, 𝑗=1 𝛼∈I 𝑘𝑁

for some functions 𝑎 𝑖 𝑗 𝛼 : 𝛺 → C indexed by 𝑖, 𝑗 = 1, . . . , 𝑚, and multi-indices

𝛼 ∈ I 𝑘𝑁 (defined in (5.29)). As usual, 𝜕 𝛼 = 𝜕1𝛼1 · · · 𝜕𝑁𝛼𝑁 . We view 𝐴 as an
unbounded operator 𝐴 : dom(𝐴) ⊆ 𝐿 2 (𝛺)𝑚 → 𝐿 2 (𝛺)𝑚 . Given an 𝑓 ∈ 𝐿 2 (𝛺)𝑚 ,
we want to
find 𝑢 ∈ dom(𝐴) such that 𝐴𝑢 = 𝑓 , (7.2)
where homogeneous boundary conditions we wish to impose are incorporated into
functions in the subspace dom(𝐴). At this point, the coefficients 𝑎 𝑖 𝑗 𝛼 are allowed
to be general so long as the result of applying 𝐴 is a Schwartz distribution, that is,
we assume that
𝐴𝑢 ∈ D ′ (𝛺)𝑚 for any 𝑢 ∈ 𝐿 2 (𝛺)𝑚 . (7.3)
Of course, when 𝑢 is in dom(𝐴), 𝐴𝑢 is not merely a distribution, but is in 𝐿 2 (𝛺)𝑚 .

7.1. Graph spaces, boundary operators and their broken versions

Assume that the Schwartz space D(𝛺)𝑚 of smooth compactly supported test func-
tions in 𝛺 is contained in the domain of 𝐴:
D(𝛺)𝑚 ⊆ dom(𝐴). (7.4)
A consequence of (7.4) is that 𝐴 is densely defined, so its (maximal) adjoint 𝐴∗ is
uniquely defined as follows (see e.g. Brezis 2011 or Kato 1995). First define the
set dom(𝐴∗ ) ⊆ 𝐿 2 (𝛺)𝑚 by
dom(𝐴∗ ) = {𝑔 ∈ 𝐿 2 (𝛺)𝑚 : there is an 𝑓 ∈ 𝐿 2 (𝛺)𝑚 such that
(𝑔, 𝐴𝑢)𝛺 = ( 𝑓 , 𝑢)𝛺 for all 𝑢 ∈ dom(𝐴)}. (7.5)
Then define 𝐴∗ : dom(𝐴∗ ) ⊆ 𝐿 2 (𝛺)𝑚 → 𝐿 2 (𝛺)𝑚 by
(𝐴∗ 𝑔, 𝑢)𝛺 = (𝑔, 𝐴𝑢)𝛺 for all 𝑢 ∈ dom(𝐴) and 𝑔 ∈ dom(𝐴∗ ). (7.6)
By virtue of assumption (7.3), for any 𝑢 ∈ 𝐿 2 (𝛺), the distribution 𝐴𝑢 is such
that its action on a 𝜑˜ ∈ D(𝛺)𝑚 takes the form
𝑚 ∑︁
∑︁
˜ =
(𝐴𝑢)(𝜑) (−1) | 𝛼| (𝑢, 𝑎 𝑗𝑖 𝛼 𝜕 𝛼 𝜑˜ 𝑗 )𝛺 . (7.7)
𝑖=1 𝛼∈I 𝑁
𝑘

For any 𝑢 ∈ dom(𝐴), since 𝐴𝑢 is in 𝐿 2 (𝛺)𝑚 , the left-hand side above equals

(𝐴𝑢, 𝜑)
˜ 𝛺 . Hence the condition in (7.5) is verified with 𝜑˜ in place of 𝑔, that is,
D(𝛺)𝑚 ⊆ dom(𝐴∗ ). (7.8)
In view of (7.4), (7.8) and (7.6), we can now identify 𝐴∗ with the formal adjoint
partial differential operator
𝑚
∑︁ ∑︁
∗
𝐴 𝜑˜ = 𝑒 𝑖 (−1) | 𝛼| 𝑎 𝑗𝑖 𝛼 𝜕 𝛼 𝜑˜ 𝑗 . (7.9)
𝑖, 𝑗=1 𝛼∈I 𝑁
𝑘

To circumvent issues concerning products of distributions and non-smooth func-

tions, we assume that the application of this formal adjoint satisfies
𝐴∗ 𝑢˜ ∈ D ′ (𝛺)𝑚 for any 𝑢˜ ∈ 𝐿 2 (𝛺)𝑚 , (7.10)
analogous to (7.3).
Note that (7.3) and (7.10) imply that we may restrict 𝐴𝑢 and 𝐴∗ 𝑢˜ to any non-
empty open subset 𝑆 ⊆ 𝛺 to get distributions in D ′ (𝑆). This allows us to define the
following graph spaces on 𝑆:
𝑊(𝑆) = {𝑢 ∈ 𝐿 2 (𝑆)𝑚 : 𝐴𝑢 ∈ 𝐿 2 (𝑆)𝑚 }, ∥𝑤∥ 2𝑊(𝑆) = ∥𝑤∥ 2𝑆 + ∥ 𝐴𝑤∥ 2𝑆 ,
˜
𝑊(𝑆) = {𝑢 ∈ 𝐿 2 (𝑆)𝑚 : 𝐴∗ 𝑢 ∈ 𝐿 2 (𝑆)𝑚 }, ˜ 2𝑊(𝑆)
∥ 𝑤∥ ˜ ˜ 2𝑆 + ∥ 𝐴∗ 𝑤∥
= ∥ 𝑤∥ ˜ 2𝑆 .

Here ∥ · ∥ 𝑆 denotes the norm of 𝐿 2 (𝑆)𝑚 ; the corresponding inner product is denoted
(·, ·)𝑆 . Our assumptions imply that these inner product spaces are complete.
Lemma 7.1. The spaces 𝑊(𝑆) and 𝑊(𝑆)
˜ are Hilbert spaces.
Proof. Consider a Cauchy sequence 𝑢 𝑛 in 𝑊(𝑆). Clearly, 𝑢 𝑛 is Cauchy in 𝐿 2 (𝑆)𝑚
and 𝐴𝑢 𝑛 is Cauchy in 𝐿 2 (𝑆)𝑚 . Hence there is a 𝑢 ∈ 𝐿 2 (𝑆)𝑚 and 𝑓 ∈ 𝐿 2 (𝑆)𝑚
such that ∥𝑢 − 𝑢 𝑛 ∥ 𝑆 → 0 and ∥ 𝑓 − 𝐴𝑢 𝑛 ∥ 𝑆 → 0. To show that 𝑢 is in 𝑊(𝑆),
we use (7.7), a consequence of assumption (7.3), to get that for any 𝜑˜ ∈ D(𝑆)𝑚 ,
˜ = (𝑢 𝑛 , 𝐴∗ 𝜑)
(𝐴𝑢 𝑛 )(𝜑) ˜ 𝑆 → (𝑢, 𝐴∗ 𝜑) ˜ as 𝑛 → ∞. Since 𝐴𝑢 𝑛 → 𝑓 in
˜ 𝑆 = (𝐴𝑢)(𝜑)
𝐿 (𝛺) , this implies that the distribution 𝐴𝑢 must equal 𝑓 in 𝐿 2 (𝛺)𝑚 . This proves
2 𝑚

the completeness of 𝑊(𝑆). The completeness of 𝑊(𝑆)˜ is similarly proved by using

assumption (7.10) in place of (7.3).
Next, we need boundary operators, which are bounded linear operators
˜ ∗
𝐷 𝑆 : 𝑊(𝑆) → 𝑊(𝑆) and 𝐷˜ 𝑆 : 𝑊(𝑆)
˜ → 𝑊(𝑆)∗
defined by
⟨𝐷 𝑆 𝑤, 𝑤⟩˜ 𝑊(𝑆)
˜ ˜ 𝑆 − (𝑤, 𝐴∗ 𝑤)
= (𝐴𝑤, 𝑤) ˜ 𝑆,
∗
(7.11)
⟨ 𝐷˜ 𝑆 𝑤,
˜ 𝑤⟩𝑊(𝑆) = (𝐴 𝑤,˜ 𝑤)𝑆 − (𝑤,
˜ 𝐴𝑤)𝑆 ,
for all 𝑤 ∈ 𝑊(𝑆) and 𝑤˜ ∈ 𝑊(𝑆).
˜ Obviously ⟨𝐷 𝑆 𝑤, 𝑤⟩˜ 𝑊(𝑆)
˜ is obtained by conjug-
ating ⟨ 𝐷 𝑆 𝑤,
˜ ˜ 𝑤⟩𝑊(𝑆) and changing sign, but note that the domains and codomains
of 𝐷 𝑆 and 𝐷˜ 𝑆 are different.

When 𝑆 = 𝛺, we abbreviate 𝑊(𝑆), 𝑊(𝑆), ˜ 𝐷 𝑆 and 𝐷˜ 𝑆 to 𝑊, 𝑊, ˜ 𝐷 and 𝐷˜

respectively. Furthermore, let 𝑉 and 𝑉˜ denote the linear subspaces dom(𝐴) and
dom(𝐴∗ ) made into normed spaces using ∥ · ∥ 𝑊 and ∥ · ∥ 𝑊˜ respectively, that is,
𝑉 = (dom(𝐴), ∥ · ∥ 𝑊 ), 𝑉˜ = (dom(𝐴∗ ), ∥ · ∥ 𝑊˜ ). (7.12)
Clearly 𝑉 ⊂ 𝑊 and 𝑉˜ ⊂ 𝑊.˜ Given any subspace 𝑅 of the dual space 𝑋 ∗ , we denote
⊥
its left annihilator by 𝑅 = {𝑤 ∈ 𝑋 : ⟨𝑠′ , 𝑤⟩𝑋 = 0 for all 𝑠′ ∈ 𝑅}.
Lemma 7.2. The space 𝐷𝑉 = { 𝑣˜ ∈ 𝑊˜ ∗ : 𝑣˜ = 𝐷𝑣 for some 𝑣 ∈ 𝑉 } satisfies
⊥
𝑉˜ = 𝐷𝑉 . (7.13)
Proof. If 𝑣˜ ∈ 𝑉˜ = dom(𝐴∗ ),
then for any 𝑣 ∈ 𝑉 = dom(𝐴), by the definition of
the adjoint (7.6), we have ⟨𝐷𝑣, 𝑣˜ ⟩𝑊˜ = (𝐴𝑣, 𝑣˜ )𝛺 − (𝑣, 𝐴∗ 𝑣˜ )𝛺 = 0, so 𝑉˜ ⊆ ⊥ 𝐷𝑉.
For the reverse inclusion, let
⊥
𝑤˜ ∈ 𝐷𝑉 = { 𝑤˜ ∈ 𝑊˜ : ⟨𝐷𝑣, 𝑤⟩
˜ 𝑊˜ = 0 for all 𝑣 ∈ 𝑉 }.
Then 𝑓 = 𝐴∗ 𝑤˜ ∈ 𝐿 2 (𝛺)𝑚 satisfies (𝑣, 𝑓 )𝛺 − (𝐴𝑣, 𝑤)
˜ 𝛺 = −⟨𝐷𝑣, 𝑤⟩˜ 𝑊˜ = 0 for
∗
all 𝑣 ∈ 𝑉 = dom(𝐴), so given the definition of dom(𝐴 ) in (7.5), 𝑤˜ must be in
𝑉˜ = dom(𝐴∗ ).
For our wellposedness theorems later, we need to place an assumption which
represents an equality analogous to Lemma 7.2 but with the roles of 𝑉˜ and 𝑉
reversed, namely
⊥
𝑉 = 𝐷˜ 𝑉˜ . (7.14)
In applications, (7.14) being a constraint on 𝑉 = dom(𝐴) restricts admissible
boundary conditions in (7.2), as in the theory of Friedrichs systems. Note that
(7.14) implies that 𝑉 is a closed subspace of 𝑊. Hence, for (7.14) to hold it is
necessary for 𝐴 to be a closed operator in 𝐿 2 (𝛺)𝑚 . Similarly, (7.13) of Lemma 7.2
implies, in particular, that 𝑉˜ is closed in 𝑊,
˜ in agreement with the fact that the
∗
adjoint 𝐴 is a closed operator.
Next we use the mesh 𝛺 ℎ to define broken graph spaces (which are generally
infinite-dimensional) by
Ö Ö
𝑊ℎ = 𝑊(𝐾) and 𝑊˜ ℎ = ˜
𝑊(𝐾). (7.15)
𝐾 ∈𝛺ℎ 𝐾 ∈𝛺ℎ

For any 𝑤 ∈ 𝑊ℎ , as in (4.2), letting 𝑤| 𝐾 ≡ 𝑤 𝐾 denote the component of the product

function 𝑤 on 𝐾, we recall that 𝐷 𝐾 𝑤 𝐾 is in 𝑊(𝐾) ˜ ∗ . Let 𝐷 : 𝑊 → 𝑊 ˜ ∗ be the
ℎ ℎ ℎ
continuous linear operator defined by
∑︁
⟨𝐷 ℎ 𝑤, 𝑤⟩
˜ 𝑊˜ ℎ = ⟨𝐷 𝐾 𝑤 𝐾 , 𝑤˜ 𝐾 ⟩𝑊˜ 𝐾 for all 𝑤 ∈ 𝑊ℎ , 𝑤˜ ∈ 𝑊˜ ℎ
𝐾 ∈𝛺ℎ

and let 𝐷˜ ℎ : 𝑊˜ ℎ → 𝑊ℎ∗ be defined by

⟨ 𝐷˜ ℎ 𝑤,
˜ 𝑤⟩𝑊ℎ = ⟨𝐷 ℎ 𝑤, 𝑤⟩
˜ 𝑊˜ ℎ .

For any 𝑤 ∈ 𝑊ℎ , we let 𝐴ℎ 𝑤 denote the function obtained by applying 𝐴 to 𝑤 𝐾 ,

element by element, for all 𝐾 ∈ 𝛺 ℎ . The resulting function 𝐴ℎ 𝑤 may be viewed
as an element of the Cartesian product Π𝐾 ∈𝛺ℎ 𝐿 2 (𝐾)𝑚 , which can obviously be
embedded in 𝐿 2 (𝛺)𝑚 . This defines the map 𝐴ℎ : 𝑊ℎ → 𝐿 2 (𝛺)𝑚 . The operator
𝐴∗ℎ : 𝑊˜ ℎ → 𝐿 2 (𝛺)𝑚 is defined similarly by evaluating the action of 𝐴∗ (instead of
𝐴) element by element. Clearly
⟨𝐷 ℎ 𝑤, 𝑤⟩ ˜ 𝛺 − (𝑤, 𝐴∗ℎ 𝑤)
˜ 𝑊˜ ℎ = (𝐴ℎ 𝑤, 𝑤) ˜ 𝛺 for all 𝑤 ∈ 𝑊ℎ , 𝑤˜ ∈ 𝑊˜ ℎ . (7.16)
The obvious norm of the Cartesian products defining 𝑊ℎ and 𝑊˜ ℎ can now be
equivalently written in terms of 𝐴ℎ and 𝐴∗ℎ :
∥𝑤∥ 2𝑊˜ = ∥𝑤∥ 2𝛺 + ∥ 𝐴ℎ 𝑤∥ 2𝛺 , ∥ 𝑤∥ ˜ 2𝛺 + ∥ 𝐴∗ℎ 𝑤∥
˜ 2𝑊˜ = ∥ 𝑤∥ ˜ 2𝛺 . (7.17)
ℎ

Lemma 7.3. For all 𝑤 ∈ 𝑊 and 𝑤˜ ∈ 𝑊,

˜ we have
⟨𝐷 ℎ 𝑤, 𝑤⟩
˜ 𝑊˜ ℎ = ⟨𝐷𝑤, 𝑤⟩
˜ 𝑊˜ .
Proof. Since piecewise differential operators coincide with the global ones when
applied to functions in the unbroken spaces, 𝐴ℎ 𝑤 = 𝐴𝑤 and 𝐴∗ℎ 𝑤˜ = 𝐴∗ 𝑤.
˜ Therefore
(7.16) implies
⟨𝐷 ℎ 𝑤, 𝑤⟩ ˜ 𝛺 − (𝑤, 𝐴∗ 𝑤)
˜ 𝑊˜ ℎ = (𝐴𝑤, 𝑤) ˜ 𝛺 = ⟨𝐷 𝛺 𝑤, 𝑤⟩
˜ 𝑊˜ ,
where the last equality followed from (7.11).
Lemma 7.4. The equality (7.14) implies that any 𝑤 ∈ 𝑊ℎ satisfying 𝐷 ℎ 𝑤 = 0 is
in 𝑉. Similarly, any 𝑤˜ ∈ 𝑊˜ ℎ satisfying 𝐷˜ ℎ 𝑤˜ = 0 is in 𝑉.
˜ In fact,
𝑉˜ = { 𝑤˜ ∈ 𝑊˜ ℎ : ⟨𝐷 ℎ 𝑧, 𝑤⟩
˜ 𝑊˜ ℎ = 0 for all 𝑧 ∈ 𝑉 }. (7.18)
Proof. Let us prove (7.18) first. If 𝑣˜ ∈ 𝑉,˜ then for any 𝑧 ∈ 𝑉, using Lemmas 7.3
and 7.2, we have
⟨𝐷 ℎ 𝑧, 𝑣˜ ⟩𝑊˜ ℎ = ⟨𝐷𝑧, 𝑣˜ ⟩𝑊˜ = 0.
Thus 𝑉˜ is contained in the set on the right-hand side of (7.18).
To prove the reverse inclusion, consider a 𝑤˜ ∈ 𝑊˜ ℎ satisfying ⟨𝐷 ℎ 𝑧, 𝑤⟩
˜ 𝑊˜ ℎ = 0
for all 𝑧 ∈ 𝑉. Then
0 = ⟨𝐷 ℎ 𝑧, 𝑤⟩ ˜ 𝛺 − (𝑧, 𝐴∗ℎ 𝑤)
˜ 𝑊˜ ℎ = (𝐴ℎ 𝑧, 𝑤) ˜ for all 𝑧 ∈ 𝑉 . (7.19)
Since 𝑧 ∈ 𝑉 ⊂ 𝑊, we can replace 𝐴ℎ 𝑧 by 𝐴𝑧 above. In fact (7.19) implies that 𝐴∗ℎ 𝑤˜
also equals 𝐴∗ 𝑤, ˜ as we now show. The action of the distribution
˜ because 𝑤˜ is in 𝑊,
∗
𝐴 𝑤˜ on any 𝜑 in D(𝛺) satisfies
𝑚

(𝐴∗ 𝑤)(𝜑)
˜ = (𝐴𝜑, 𝑤) ˜ 𝛺 = (𝜑, 𝐴∗ℎ 𝑤)
˜ 𝛺 = (𝐴ℎ 𝜑, 𝑤) ˜ 𝛺 + ⟨𝐷 ℎ 𝜑, 𝑤⟩
˜ 𝑊˜ ℎ .
The last term must vanish because of the first equality of (7.19) and because
𝜑 ∈ D(𝛺)𝑚 ⊆ dom(𝐴) = 𝑉 by our assumption (7.4). Hence
|(𝐴∗ 𝑤)(𝜑)|
˜ ≤ ∥𝜑∥ 𝛺 ∥ 𝐴∗ℎ 𝑤∥
˜ 𝛺 for all 𝜑 ∈ D(𝛺)𝑚 ,

which implies that the distribution 𝐴∗ 𝑤˜ is in 𝐿 2 (𝛺)𝑚 . Therefore 𝑤˜ ∈ 𝑊.

˜ Returning
to (7.19), we find that
˜ 𝛺 − (𝑧, 𝐴∗ℎ 𝑤)
0 = (𝐴ℎ 𝑧, 𝑤) ˜
∗
˜ 𝛺 − (𝑧, 𝐴 𝑤)
= (𝐴𝑧, 𝑤) ˜ = ⟨𝐷𝑧, 𝑤⟩
˜ 𝑊˜
for all 𝑧 ∈ 𝑉. Hence 𝑤˜ ∈ ⊥ 𝐷𝑉. By (7.13) of Lemma 7.2, we then conclude that
𝑤˜ ∈ 𝑉,
˜ thus proving (7.18).
The proof of the second statement immediately follows from (7.18). Indeed, any
˜ 𝑊˜ ℎ = 0 for all 𝑤 ∈ 𝑊ℎ
𝑤˜ ∈ 𝑊˜ ℎ that is in the null space of 𝐷˜ ℎ satisfies ⟨𝐷 ℎ 𝑤, 𝑤⟩
˜
due to the relationship between 𝐷 ℎ and 𝐷 ℎ , so in particular (7.19) holds for all
𝑧 ∈ 𝑉. Hence (7.18) implies that 𝑤˜ is in 𝑉. ˜
The proof of the first statement proceeds similarly, but using (7.14) in place of
(7.13).

Bibliographical notes. Graph spaces of first-order differential operators are a clas-

sical ingredient in the theory of Friedrichs systems (Friedrichs 1958). More recently
they were studied in Sheen (1992), Jensen (2004) and Ern, Guermond and Caplain
(2007). Completeness and density results were proved in Jensen (2004), where one
also finds the term ‘broken graph space’ in the context of DG methods. Analogues
⊥ ˜
of our twin equalities of (7.13) and (7.14), namely 𝑉˜ = ⊥ 𝐷𝑉 and 𝑉 = 𝐷˜ 𝑉, prom-
inently feature as abstract conditions in modern takes on the theory of first-order
Friedrichs systems (Ern et al. 2007). Our presentation here, which is not restric-
ted to first-order operators, is based on Demkowicz, Gopalakrishnan, Nagaraj and
Sepúlveda (2017).

7.2. Hybrid ultraweak formulation suitable for DPG method

Now that we have broken graph spaces 𝑊ℎ , 𝑊˜ ℎ and elementwise boundary oper-
ators 𝐷 ℎ , 𝐷˜ ℎ , we can perform elementwise operations analogous to performing
integration by parts and moving all derivatives to the test functions. Namely, we
derive an ultraweak formulation by multiplying the equation 𝐴𝑢 = 𝑓 by a test
function 𝑤˜ ∈ 𝑊˜ ℎ , applying the definition of 𝐷 𝐾 on each element 𝐾, and summing
over all 𝐾 ∈ 𝛺 ℎ . Then we obtain
(𝑢, 𝐴∗ℎ 𝑤)
˜ 𝛺 + ⟨𝐷 ℎ 𝑢, 𝑤⟩
˜ 𝑊˜ ℎ = ⟨ 𝑓 , 𝑤⟩
˜ 𝑊˜ ℎ (7.20)

for any 𝑤˜ in 𝑊˜ ℎ . Now, since 𝐷 ℎ is not an injective operator in general, we define

𝑞ˆ = 𝐷 ℎ 𝑢, an unknown that we want to uniquely solve for. Note that 𝑊 is contained
in 𝑊ℎ , so 𝑉 can be viewed as a subspace of the broken graph space 𝑊ℎ . Let
𝐷 ℎ,𝑉 = 𝐷 ℎ | 𝑉 : 𝑉 → 𝑊˜ ℎ∗ denote the restriction of 𝐷 ℎ from 𝑊ℎ to 𝑉. Analogous to
the quotient norms that appeared earlier (such as (4.10) and (4.31)), we define
𝑄 = range(𝐷 ℎ,𝑉 ), ∥ 𝑞∥
ˆ 𝑄= inf ∥𝑣∥ 𝑊 , (7.21)
−1 { 𝑞}
𝑣 ∈𝐷ℎ,𝑉 ˆ

that is, the space

𝑄 = { 𝑞ˆ ∈ 𝑊˜ ℎ∗ : there is a 𝑣 ∈ 𝑉 satisfying 𝑞ˆ = 𝐷 ℎ 𝑣}
is endowed with the minimal norm of elements in the preimage set
−1
𝐷 ℎ,𝑉 {𝜌} = {𝑣 ∈ 𝑉 : 𝐷 ℎ 𝑣 = 𝜌},
a quotient norm that makes 𝑄 into a Hilbert space.
Using 𝑞ˆ in (7.20), we have completed the derivation of the following (hybrid)
ultraweak formulation of (7.2): find 𝑢 ∈ 𝐿 2 (𝛺)𝑚 and 𝑞ˆ ∈ 𝑄 such that
(𝑢, 𝐴∗ℎ 𝑤)
˜ 𝛺 + ⟨𝑞, ˜ 𝑊˜ ℎ = 𝐹(𝑤)
ˆ 𝑤⟩ ˜ for all 𝑤˜ ∈ 𝑊˜ ℎ , (7.22)
where 𝐹 ∈ 𝑊˜ ℎ∗ is set by 𝐹(𝑤)˜ = ( 𝑓 , 𝑤)
˜ 𝛺 . This formulation can be viewed as a
hybridized version of another with the unbroken graph space 𝑊˜ as the test space.
Indeed, multiplying 𝐴𝑢 = 𝑓 with 𝑤˜ 0 ∈ 𝑊˜ and using the definition of 𝐷 = 𝐷 𝛺
(see (7.11)), we find that 𝑢 ∈ 𝑉 solves (𝑢, 𝐴∗ 𝑤˜ 0 )𝛺 + ⟨𝐷𝑢, 𝑤˜ 0 ⟩𝑊˜ = 𝐹(𝑤˜ 0 ) for all
𝑤˜ 0 ∈ 𝑊.
˜ By Lemma 7.2, ⟨𝐷𝑢, 𝑤˜ 0 ⟩𝑊˜ = 0 is 𝑤˜ 0 is in 𝑉. ˜ Hence restricting to test
functions 𝑣˜ ∈ 𝑉˜ ⊂ 𝑊,
˜ we obtain an ‘unbroken ultraweak formulation’ that finds
𝑢 ∈ 𝑉 such that
(𝑢, 𝐴∗ 𝑣˜ )𝛺 = 𝐹(𝑣˜ ) for all 𝑣˜ ∈ 𝑉.
˜ (7.23)
Comparing with the formulation in (7.22), we see the hybrid ultraweak formulation
with broken graph spaces in (7.22) as a hybrid version of the unbroken ultraweak
formulation (7.23) obtained by introducing an ‘interface variable’, which has now
taken the abstract form of 𝑞ˆ ∈ 𝑊˜ ℎ∗ .
This suggests that the stability of hybrid ultraweak formulation may follow from
that of the unbroken ultraweak formulation (7.23) if we can verify the conditions
of (4.12) and apply Theorem 4.3. The work to make this rigorous is completed in
Theorem 7.6 below, whose proof contains a proof of the stability of the unbroken
ultraweak formulation (7.23), as well techniques to handle the element interface
terms to conclude the stability of the hybrid version (7.22). To fit into the setting
of (4.12), put
𝑋0 = 𝑉, 𝑌0 = 𝑉,
˜
𝑋ˆ = 𝑄, 𝑌 = 𝑊˜ ℎ ,
˜ = (𝑢, 𝐴∗ℎ 𝑤)
𝑏 0 (𝑢, 𝑤) ˜ 𝛺, ˆ 𝑞,
𝑏( ˜ = ⟨𝑞,
ˆ 𝑤) ˆ 𝑤⟩
˜ 𝑊˜ ℎ .
The sum of the above forms,
𝑏( (𝑢, 𝑞),
ˆ 𝑤) ˜ + 𝑏(
˜ = 𝑏 0 (𝑢, 𝑤) ˆ 𝑞, ˜ = (𝑢, 𝐴∗ℎ 𝑤)
ˆ 𝑤) ˜ 𝛺 + ⟨𝑞,
ˆ 𝑤⟩
˜ 𝑊˜ ℎ ,
is the ultraweak form in (7.22). We proceed to verify the conditions of (4.12) with
the above choices of spaces and forms. The next lemma generalizes an argument
we previously used to prove the ‘inf = sup’-type interface duality identities (of
Theorem 4.6) to the present scenario.

Lemma 7.5. Assumption (7.14) implies that for all 𝑞ˆ ∈ 𝑄,

|⟨𝑞, ˜ 𝑊˜ ℎ |
ˆ 𝑤⟩
inf ∥𝑣∥ 𝑊 = sup .
−1 { 𝑞ˆ }
𝑣 ∈𝐷ℎ,𝑉 0≠𝑤˜ ∈ 𝑊
˜ℎ ∥ 𝑤∥
˜ 𝑊˜ ℎ

Proof. We use two functions, 𝑢˜ 𝑞ˆ and 𝑢 𝑞ˆ , both obtained from 𝑞, ˆ but by solving
two distinct boundary value problems. First, the supremum of the lemma, which
we denote by 𝑠, is attained by the function 𝑢˜ 𝑞ˆ in 𝑊˜ ℎ satisfying
(𝐴∗ℎ 𝑢˜ 𝑞ˆ , 𝐴∗ℎ 𝑤)
˜ 𝛺 + (𝑢˜ 𝑞ˆ , 𝑤)
˜ 𝛺 = −⟨𝑞,
ˆ 𝑤⟩
˜ 𝑊˜ ℎ (7.25)

for all 𝑤˜ ∈ 𝑊˜ ℎ , and moreover, it equals

𝑠 = ∥ 𝑢˜ 𝑞ˆ ∥ 𝑊˜ ℎ . (7.26)
Note that 𝑞ˆ = 𝐷 ℎ 𝑧 for some 𝑧 ∈ 𝑉. Hence
−⟨𝑞, ˜ 𝑊˜ ℎ = −⟨𝐷 ℎ 𝑧, 𝜑⟩
ˆ 𝜑⟩ ˜ + (𝑧, 𝐴∗ℎ 𝜑)
˜ = −(𝐴ℎ 𝑧, 𝜑) ˜ + (𝑧, 𝐴∗ 𝜑)
˜ = −(𝐴𝑧, 𝜑) ˜ =0
due to (7.6), since 𝑧 ∈ dom(𝐴) and 𝜑˜ ∈ dom(𝐴∗ ) by assumption (7.8). Therefore,
choosing 𝑦 = 𝜑˜ ∈ D(𝛺)𝑚 in (7.25), the right-hand side vanishes, and we conclude
that the distribution 𝐴(𝐴∗ℎ 𝑢˜ 𝑞ˆ ) is in 𝐿 2 (𝛺)𝑚 and equals −𝑢˜ 𝑞ˆ . Hence (7.16) is
applicable with 𝑤 = 𝐴∗ℎ 𝑢˜ 𝑞ˆ and we obtain
𝐴𝐴∗ℎ 𝑢˜ 𝑞ˆ + 𝑢˜ 𝑞ˆ = 0, (7.27a)
𝐷 ℎ 𝐴∗ℎ 𝑢˜ 𝑞ˆ = 𝑞.ˆ (7.27b)
Let 𝑢 𝑞ˆ = 𝐴∗ℎ 𝑢˜ 𝑞ˆ . Then (7.27a) implies 𝐴𝑢 𝑞ˆ = −𝑢˜ 𝑞ˆ , which implies 𝐴∗ℎ 𝐴𝑢 𝑞ˆ =
−𝐴∗ℎ 𝑢˜ 𝑞ˆ = −𝑢 𝑞ˆ . Combining with (7.27b), we conclude that 𝑢 𝑞ˆ solves
𝐴∗ℎ 𝐴𝑢 𝑞ˆ + 𝑢 𝑞ˆ = 0, (7.28a)
𝐷 ℎ 𝑢 𝑞ˆ = 𝑞.
ˆ (7.28b)

Since 𝐴𝑢 𝑞ˆ is in 𝐿 2 (𝛺) by (7.27a), we know that 𝑢 𝑞ˆ is in 𝑊. Let us now prove that

⊥ ˜
𝑢 𝑞ˆ is actually in 𝑉. By assumption (7.14), it suffices to prove that 𝑢 𝑞ˆ is in 𝐷˜ 𝑉.
For any 𝑣˜ in 𝑉,˜ Lemma 7.3 implies

⟨𝐷𝑢 𝑞ˆ , 𝑣˜ ⟩𝑊˜ = ⟨𝐷 ℎ 𝑢 𝑞ˆ , 𝑣˜ ⟩𝑊˜ ℎ = ⟨𝑞,

ˆ 𝑣˜ ⟩𝑊˜ ℎ = ⟨𝐷 ℎ 𝑧, 𝑣˜ ⟩𝑊˜ ℎ = ⟨𝐷𝑧, 𝑣˜ ⟩𝑊˜ .
⊥
The last term is zero by Lemma 7.2. Hence 𝑢 𝑞ˆ is in 𝐷˜ 𝑉˜ = 𝑉.
Thus 𝑢 𝑞ˆ is in the set 𝐷 ℎ,𝑉 −1 {𝑞} over which the infimum of the lemma is taken.

We claim that the infimum of the lemma is achieved by 𝑢 𝑞ˆ . Standard variational

arguments show that the infimum is attained by a unique minimizer 𝑣 𝑞ˆ ∈ 𝑉 satisfy-
−1 {0}. Choosing a
ing 𝐷 ℎ 𝑣 𝑞ˆ = 𝑞ˆ and (𝐴ℎ 𝑣 𝑞ˆ , 𝐴ℎ 𝑣)𝛺 + (𝑣 𝑞ˆ , 𝑣)𝛺 = 0 for all 𝑣 ∈ 𝐷 ℎ,𝑉
𝑣 in D(𝐾)𝑚 , whose extension by zero is in 𝐷 ℎ,𝑉 −1 {0}, we conclude that distribution

𝐴∗ (𝐴ℎ 𝑣 𝑞ˆ )| 𝐾 is in 𝐿 2 (𝐾)𝑚 for any 𝐾 ∈ 𝛺 ℎ . Therefore 𝐴∗ℎ 𝐴ℎ 𝑣 𝑞ˆ is in 𝐿 2 (𝛺)𝑚 and

satisfies (7.28), so 𝑣 𝑞ˆ = 𝑢 𝑞ˆ .

To complete the proof, it now suffices to show that

∥𝑢 𝑞ˆ ∥ 𝑊 = ∥ 𝑢˜ 𝑞ˆ ∥ 𝑊˜ ℎ , (7.29)
since the left-hand side equals the infimum, as we just established, and the right-
hand side equals the supremum by (7.26). But (7.29) is obvious from 𝑢 𝑞ˆ = 𝐴∗ℎ 𝑢˜ 𝑞ˆ
and 𝐴𝑢 𝑞ˆ = −𝑢˜ 𝑞ˆ .
Theorem 7.6 (Wellposedness of the hybrid ultraweak formulation). Let 𝐴 be
the partial differential operator in (7.1) satisfying (7.3), (7.4) and (7.10). If, in
addition, (7.14) holds and
𝐴 : 𝑉 → 𝐿 2 (𝛺)𝑚 is a bijection, (7.30)
then the ultraweak formulation (7.22) is wellposed. Moreover, if 𝐹(𝑣) = ( 𝑓 , 𝑣)𝛺
for some 𝑓 ∈ 𝐿 2 (𝛺)𝑚 , then the unique 𝑢 and 𝑞ˆ satisfying (7.22) are such that 𝑢
solves (7.2), 𝑢 is in 𝑉, and 𝑞ˆ satisfies 𝑞ˆ = 𝐷 ℎ 𝑢.
Proof. To apply Theorem 4.3 for the current setting (7.24), we need to verify its
conditions (4.12b) and (4.12c). In the present case, this task requires us to prove
that there are positive constants 𝑐 0 , 𝑐ˆ such that
|(𝑢, 𝐴∗ℎ 𝑦 0 )𝛺 |
𝑐 0 ∥𝑢∥ 𝛺 ≤ sup for all 𝑢 ∈ 𝐿 2 (𝛺)𝑚 , (7.31)
0≠𝑦0 ∈𝑌0 ∥𝑦 0 ∥ 𝑊˜ ℎ
|⟨𝑞, 𝑤⟩
˜ 𝑊˜ ℎ |
𝑐ˆ ∥𝑞∥ 𝑄 ≤ sup for all 𝑞 ∈ 𝑄, (7.32)
0≠𝑦 ∈ 𝑊
˜ℎ ∥ 𝑤∥
˜ 𝑊˜ ℎ
where
𝑌0 = {𝑦 ∈ 𝑊˜ ℎ : ⟨𝑟,
ˆ 𝑦⟩𝑊˜ ℎ = 0 for all 𝑟ˆ ∈ 𝑄}.
By (7.18) of Lemma 7.4 and the definition of 𝑄, we have
𝑉˜ = { 𝑤˜ ∈ 𝑊˜ ℎ : ⟨𝑟, ˜ 𝑊˜ ℎ = 0 for all 𝑟ˆ ∈ 𝑄},
ˆ 𝑤⟩
that is,
𝑌0 = 𝑉˜ . (7.33)
Since (7.32) follows with 𝑐ˆ = 1 from Lemma 7.5, we focus on (7.31), which
amounts to proving the stability of the unbroken ultraweak formulation (7.23).
As already noted, (7.14) implies that 𝐴 is a closed operator. By the Closed
Range Theorem for closed operators, if 𝐴 : dom 𝐴 → 𝐿 2 (𝛺)𝑚 is a bijection, then
𝐴∗ : dom(𝐴∗ ) → 𝐿 2 (𝛺)𝑚 is also a bijection. Hence, assumption (7.30) and (7.33)
imply that 𝐴∗ : 𝑌0 → 𝐿 2 (𝛺)𝑚 is a bijection. Thus there is a constant 𝑐 > 0 such
that 𝑐∥𝑦∥ 𝛺 ≤ ∥ 𝐴∗ 𝑦∥ 𝛺 for all 𝑦 ∈ 𝑌0 . Moreover, given any 𝑢 ∈ 𝐿 2 (𝛺)𝑚 , there is
a 𝑦 𝑢 ∈ 𝑌0 such that 𝐴∗ 𝑦 𝑢 = 𝑢. Hence the supremum in (7.31), for any given 𝑢,
admits the bound
|(𝑢, 𝐴∗ℎ 𝑦 0 )𝛺 | ∥ 𝐴∗ 𝑦 𝑢 ∥ 2𝛺
sup ≥ 2 ,
0≠𝑦0 ∈𝑌0 ∥𝑦 0 ∥ 𝑊˜ ℎ (𝑐 + 1)1/2 ∥ 𝐴∗ 𝑦 𝑢 ∥ 𝛺

and (7.31) follows with 𝑐 0 = 1/(𝑐2 + 1)1/2 . Hence, applying Theorem 4.3, the proof
is complete.
Example 7.7 (Schrödinger equation). This is an example of a second-order op-
erator for which the previous theory applies. Let 𝜕𝑥 𝑥 denote the Laplacian with
respect to a spatial variable 0 < 𝑥 < 𝐿, let 𝜕𝑡 denote the derivative 𝜕/𝜕𝑡 with
respect to 0 < 𝑡 < 𝑇 (where both 𝐿 and 𝑇 are finite), let 𝛺 = (0, 𝐿) × (0, 𝑇),
and let 𝑓 ∈ 𝐿 2 (𝛺). The classical form of the Schrödinger initial boundary value
problem is
𝚤ˆ𝜕𝑡 𝑢 − 𝜕𝑥 𝑥 𝑢 = 𝑓 , 0 < 𝑥 < 𝐿, 0 < 𝑡 < 𝑇, (7.34a)
𝑢(𝑥, 𝑡) = 0, 𝑥 = 0 or 𝑥 = 𝐿, 0 < 𝑡 < 𝑇, (7.34b)
𝑢(𝑥, 0) = 0, 0 < 𝑥 < 𝐿. (7.34c)
Here 𝑓 is any given function in 𝐿 2 (𝛺). Viewing 𝛺 as a rectangle with time as
the vertical axis, let 𝛤 denote the union of vertical boundary walls and the bottom
initial time slice, and let 𝛤˜ denote the union of vertical boundary walls and the top
final time slice. Then the initial and boundary conditions together can be written
as 𝑢| 𝛤 = 0.
To fit into the previous framework, set
𝑘 = 2, 𝑚 = 1, 𝐴 = 𝚤ˆ𝜕𝑡 − 𝜕𝑥 𝑥 .
Then the formal adjoint expression in (7.9) reads 𝐴∗ = 𝚤ˆ𝜕𝑡 − 𝜕𝑥 𝑥 = 𝐴. Hence the
graph spaces are
𝑊 = 𝑊˜ = {𝑢 ∈ 𝐿 2 (𝛺) : 𝑖𝜕𝑡 𝑢 − Δ 𝑥 𝑢 ∈ 𝐿 2 (𝛺)},
and the boundary operator 𝐷˜ = 𝐷 : 𝑊 → 𝑊 ∗ is set by ⟨𝐷𝑤, 𝑣⟩𝑊 = (𝐴𝑤, 𝑣)𝛺 −
(𝑤, 𝐴𝑣)𝛺 for all 𝑤, 𝑣 ∈ 𝑊. As usual, let D( 𝛺)
¯ denote the restrictions of functions
from D(R 𝑁 ) to 𝛺. Integration by parts shows that
∫ ∫ ∫
⟨𝐷𝜙, 𝜓⟩𝑊 = 𝚤ˆ𝑛𝑡 𝜙𝜓¯ + 𝜙𝑛 𝑥 𝜕𝑥 𝜓¯ − 𝑛 𝑥 𝜕𝑥 𝜙𝜓¯ (7.35)
𝜕𝛺 𝜕𝛺 𝜕𝛺

for all 𝜙, 𝜓 ∈ D( 𝛺),¯ where we have used the spatial and temporal components
𝑛 𝑥 , 𝑛𝑡 of the outward unit normal 𝑛 on 𝜕𝛺.
To incorporate the boundary and initial conditions into dom(𝐴), circumventing
the development of a full trace theory for the graph space, we first set
Ṽ = {𝜑 ∈ D( 𝛺)
¯ : 𝜑| 𝛤˜ = 0},
and use it to set
dom(𝐴) = {𝑢 ∈ 𝑊 : ⟨𝐷𝑣, 𝑢⟩𝑊 = 0 for all 𝑣 ∈ Ṽ }, (7.36)
or equivalently
⊥
𝑉= 𝐷 Ṽ . (7.37)
For smooth 𝑢 ∈ D( 𝛺)
¯ ∩ dom(𝐴), the integration-by-parts formula (7.35) shows that

𝑢| 𝛤 must vanish. Note that assumptions (7.3), (7.4) and (7.10) are immediately
verified. By Lemma 7.2, the domain of the maximal adjoint is given by
⊥
𝑉˜ = 𝐷𝑉 . (7.38)
It is shown in Demkowicz et al. (2017, Theorem 3.1) that Ṽ is dense in 𝑉. ˜ Hence
⊥
(7.37) implies 𝑉 = 𝐷𝑉˜ and assumption (7.14) is also verified.
To conclude wellposedness of the ultraweak formulation (7.22) for this Schrö-
dinger problem, the only remaining assumption we need to verify is the bijectivity
stated in (7.30). Let 𝜙 𝑘 (𝑥) in 𝐻01 (0, 𝐿) and let 𝜆 𝑘 > 0 be a Laplace eigenpair
satisfying −𝜕𝑥 𝑥 𝜙 𝑘 = 𝜆 𝑘 𝜙 𝑘 normalized so that ∥𝜙 𝑘 ∥ (0,𝐿) = 1 for all natural numbers
𝑘 ≥ 1. Suppose 𝑓 ∈ 𝐿 2 (𝛺) and
∫ 𝐿 ∫ 𝑡
𝑓 𝑘 (𝑡) = 𝑓 (𝑥, 𝑡)𝜙¯𝑘 (𝑥) 𝑑𝑥, 𝑢 𝑘 (𝑡) = −ˆ𝚤 𝑒 𝚤ˆ𝜆𝑘 (𝑡 −𝑠) 𝑓 𝑘 (𝑠) 𝑑𝑠, (7.39a)
0 0
𝑀
∑︁ 𝑀
∑︁
𝐹𝑀 (𝑥, 𝑡) = 𝑓 𝑘 (𝑡)𝜙 𝑘 (𝑥), 𝑈 𝑀 (𝑥, 𝑡) = 𝑢 𝑘 (𝑡)𝜙 𝑘 (𝑥). (7.39b)
𝑘=1 𝑘=1

It is immediately verified that 𝐴𝑈 𝑀 = 𝐹𝑀 . Since 𝑈 𝑀 and any 𝜑 ∈ Ṽ are smooth

enough for integration by parts using 𝜑| 𝛤 ∗ = 0 and 𝑈 𝑀 | 𝛤 = 0, we have
(ˆ𝚤 𝜕𝑡 𝑈 𝑀 , 𝜑)𝛺 = (𝑈 𝑀 , 𝚤ˆ𝜕𝑡 𝜑)𝛺 ,
(Δ𝑈 𝑀 , 𝜑)𝛺 = (𝑈 𝑀 , Δ𝜑)𝛺 .
Hence ⟨𝐷𝜑, 𝑈 𝑀 ⟩𝑊 = (𝐴𝜑, 𝑈 𝑀 )𝛺 − (𝜑, 𝐴𝑈 𝑀 )𝛺 = 0 for all 𝜑 ∈ Ṽ. By (7.37), this
implies that 𝑈 𝑀 is in 𝑉.
To prove that 𝐴 is surjective, it now suffices to show that the limit 𝑢 of 𝑈 𝑀 exists
in 𝑉 and solves 𝐴𝑢 = 𝑓 . Note that 𝑈 𝑀 is a Cauchy sequence in 𝑉. Indeed, for any
𝑁 > 𝑀, by (7.39),
𝑁 ∫ 𝑇 ∞ ∫ 𝑇
2
∑︁
2 1 2 ∑︁
∥𝑈 𝑀 − 𝑈 𝑁 ∥ 𝛺 = |𝑢 𝑘 (𝑡)| 𝑑𝑡 ≤ 𝑇 | 𝑓 𝑘 (𝑡)| 2 𝑑𝑡,
𝑘=𝑀+1 0
2 𝑘=𝑀+1 0
∞
∑︁ ∫ 𝑇
∥ 𝐴(𝑈 𝑀 − 𝑈 𝑁 )∥ 2𝛺 = ∥𝐹𝑀 − 𝐹𝑁 ∥ 2𝛺 ≤ | 𝑓 𝑘 (𝑡)| 2 𝑑𝑡,
𝑘=𝑀+1 0

both of which converge to 0 as 𝑀 → ∞, because 𝑓 ∈ 𝐿 2 (𝛺). Thus, having shown

that 𝑈 𝑀 is a Cauchy sequence in 𝑉, we conclude that it must have an accumulation
point 𝑢 in 𝑉. Moreover, since 𝐴𝑢 and 𝑓 are 𝐿 2 (𝛺)-limits of the same sequence
𝐹𝑀 = 𝐴𝑈 𝑀 , we have 𝐴𝑢 = 𝑓 . Thus 𝐴 : 𝑉 → 𝐿 2 (𝛺) is surjective.
That 𝐴 is in fact a bijection can be shown in many ways. For example, we can
use an argument, completely analogous to the above, but now using (7.38) and
with 𝑢 𝑘 defined by integrals from 𝑇 to 𝑡, to show that 𝐴 = 𝐴∗ : 𝑉 ∗ → 𝐿 2 (𝛺) is
also surjective. Since ker(𝐴) = ⊥ range(𝐴∗ ) this implies that 𝐴 : 𝑉 → 𝐿 2 (𝛺) is
injective, thus completing the verification of (7.30).

Example 7.8 (Poisson equation in first-order form). Reconsidering the Dirich-

let boundary value problem (4.5) of Example 4.2, we now develop a different vari-
ational formulation for it. Reformulating −Δ𝑢 = 𝑓 into a first-order system by
introducing the flux 𝑞 = − grad 𝑢,

𝑞 + grad 𝑢 = 0 in 𝛺,
div 𝑞 = 𝑓 in 𝛺,
𝑢=0 on 𝜕𝛺.

Using a group variable 𝑣 = (𝑞, 𝑢) ∈ 𝐿 2 (𝛺) 𝑁 × 𝐿 2 (𝛺), consider the unbounded

operator

𝐴𝑣 ≡ 𝐴(𝑞, 𝑢) = (𝑞 + grad 𝑢, div 𝑞), 𝑘 = 1, 𝑚 = 𝑁 + 1,

dom(𝐴) = 𝐻(div, 𝛺) × 𝐻˚ 1 (𝛺).

We easily see that the adjoint operator, acting on 𝑣˜ = (𝑞, ˜ ∈ 𝐿 2 (𝛺) 𝑁 × 𝐿 2 (𝛺), is
˜ 𝑢)

𝐴∗ 𝑣˜ ≡ 𝐴∗ (𝑞, ˜ = (𝑞˜ − grad 𝑢,

˜ 𝑢) ˜ − div 𝑞),
˜
∗
dom(𝐴 ) = dom(𝐴).

Clearly assumptions (7.3), (7.4) and (7.10) hold for this example. Since 𝑉 = 𝑉˜
and 𝐷 = 𝐷,˜ the assumption (7.14) also holds since it is the same as the conclusion
(7.13) of Lemma 7.2.
Note that if (𝑞, 𝑢) and 𝐴(𝑞, 𝑢) are both in 𝐿 2 (𝛺)𝑚 , then obviously div 𝑞 ∈ 𝐿 2 (𝛺)
and grad 𝑢 ∈ 𝐿 2 (𝛺) 𝑁 , so

𝑊 = 𝑊˜ = 𝐻(div, 𝛺) × 𝐻 1 (𝛺),
𝑊ℎ = 𝑊˜ ℎ = 𝐻(div, 𝛺 ℎ ) × 𝐻 1 (𝛺 ℎ ),

using the broken Sobolev spaces defined in (4.4) and (4.33). Hence, for any
(𝑞, 𝑢) ∈ 𝑊ℎ and (𝑞,˜ 𝑢)˜ ∈ 𝑊˜ ℎ ,
∑︁
⟨𝐷 ℎ (𝑞, 𝑢), (𝑞, ˜ 𝑊˜ =
˜ 𝑢)⟩ ⟨𝑞 · 𝑛, 𝑢⟩
˜ 𝐻 1/2 (𝜕𝐾) + ⟨𝑢, 𝑞˜ · 𝑛⟩ 𝐻 −1/2 (𝜕𝐾)
𝐾 ∈𝛺ℎ
≡ ⟨𝑞 · 𝑛, 𝑢⟩
˜ ℎ + ⟨𝑢, 𝑞˜ · 𝑛⟩ℎ , (7.40)

where in the last step we have extended the previous notation of (4.8) ⟨·, ·⟩ℎ
to include sums of duality pairings in both 𝐻 1/2 (𝜕𝐾) 𝐻 −1/2 (𝜕𝐾). Since any
(𝑞, 𝑢) ∈ 𝑉 = dom(𝐴) satisfies 𝑢| 𝜕𝛺 = 0 on the global boundary, its element-by-
element trace tr(𝑢), as defined in (4.34), lies in

𝐻˚ 1/2 (𝜕𝛺 ℎ ) = { 𝑤ˆ ∈ 𝐻 1/2 (𝜕𝛺 ℎ ) : 𝑤| 𝜕𝛺 = 0} = tr 𝐻˚ 1 (𝛺).

The (hybrid) ultraweak formulation (7.22) now takes the form (1.1) with the fol-
lowing forms:
ˆ 𝑞ˆ 𝑛 ), (𝑟, 𝑣) ) = (𝑞, 𝑟)ℎ − (𝑢, div 𝑟)ℎ + ⟨𝑢,
𝑏( (𝑞, 𝑢, 𝑢, ˆ 𝑟 · 𝑛⟩ℎ
− (𝑞, grad 𝑣)ℎ + ⟨𝑣, 𝑞ˆ 𝑛 ⟩ℎ ,
ℓ(𝑟, 𝑣) = ( 𝑓 , 𝑣)𝛺 ,
where (𝑞, 𝑢) ∈ 𝑊ℎ , (𝑟, 𝑣) ∈ 𝑊˜ ℎ , 𝑢ˆ ∈ 𝐻˚ 1/2 (𝜕𝛺 ℎ ) and 𝑞ˆ 𝑛 ∈ 𝐻 −1/2 (𝜕𝛺 ℎ ).
By Theorem 7.6, this ultraweak formulation is wellposed if 𝐴 : 𝑉 → 𝐿 2 (𝛺) 𝑁 +1
is a bijection, that is, if there is a unique 𝑞 ∈ 𝐻(div, 𝛺) and 𝑢 ∈ 𝐻˚ 1 (𝛺) satisfying
𝑞 + grad 𝑢 = 𝐺 on 𝛺, (7.41a)
div 𝑞 = 𝐹 on 𝛺 (7.41b)
for any given 𝐹 ∈ 𝐿 2 (𝛺) and 𝐺 ∈ 𝐿 2 (𝛺) 𝑁 . To verify this condition, it is sufficient
to note that 𝑞 and 𝑢 satisfy (7.41) if and only if they form the unique solution
of the well-known mixed weak problem (see e.g. Brezzi and Fortin 1991, Ch. II,
Prop. 1.3) to find 𝑞 in 𝐻(div, 𝛺) and 𝑢 in 𝐿 2 (𝛺) such that
(𝑞, 𝑟)𝛺 − (𝑢, div 𝑟)𝛺 = (𝐺, 𝑟)𝛺 for all 𝑟 ∈ 𝐻(div, 𝛺), (7.42a)
2
(div 𝑞, 𝑤)𝛺 = (𝐹, 𝑤)𝛺 for all 𝑤 ∈ 𝐿 (𝛺). (7.42b)
It is easy to see that (7.42a) also implies that 𝑢 ∈ 𝐻˚ 1 (𝛺) and (7.41a) holds. Hence
the unique solution (𝑞, 𝑢) of (7.42) is in 𝑉 and solves 𝐴(𝑞, 𝑢) = (𝐺, 𝐹), thus
verifying assumption (7.30).
Bibliographical notes. Theorem 7.6 and the treatment of the Schrödinger equation
(Example 7.7) by DPG methods appeared first in Demkowicz et al. (2017). There
it is also pointed out why it is not advisable to split the Schrödinger equation
into a first-order system. This is the reason for staying with the original second-
order form of the Schrödinger equation while deriving the ultraweak formulation in
Example 7.7. The wellposedness result in Example 7.8 was first proved in Demko-
wicz and Gopalakrishnan (2011a), but using different techniques. An application
of Theorem 7.6 to the spacetime wave equation can be found in Gopalakrishnan
and Sepúlveda (2019). That paper also notes how the spacetime wave operator
produces a 𝐷 ℎ operator with a non-trivial null space (even though 𝑞ˆ = 𝐷 ℎ 𝑢 can
be uniquely determined) and how one overcomes the consequent difficulties in
practically solving an ultraweak discretization.

7.3. Analysis with scaled and optimal norms

In practice, it is often useful to introduce a scaling parameter to tune the norm in
which the residual is minimized. In this subsection we consider the case where the
terms in the test space norm we have been working with (see (7.17)) are differently
weighted. We continue to use the notation from the previous subsections, e.g. 𝑊˜ ℎ

is as in (7.15) and 𝑄 is as in (7.21), but now consider a new test norm on 𝑊˜ ℎ ,

defined for any 0 < 𝑠 < ∞, by
˜ 𝑌2 ,𝑠 = 𝑠 −2 ∥ 𝑤∥
∥ 𝑤∥ ˜ 2𝛺 + ∥ 𝐴∗ℎ 𝑤∥
˜ 2𝛺 , 𝑤˜ ∈ 𝑌 = 𝑊˜ ℎ ,
and a new 𝑠-dependent norm on the trial space by

ˆ 2𝑋,𝑠 = inf
∥(𝑤, 𝑟)∥ ∥𝑤 − 𝑣∥ 2𝛺 + 𝑠2 ∥ 𝐴𝑣∥ 2𝛺 , ˆ ∈ 𝑋 = 𝐿 2 (𝛺)𝑚 × 𝑄.
(𝑤, 𝑟)
−1 { 𝑟ˆ }
𝑣 ∈𝐷ℎ,𝑉

For any fixed 𝑠 > 0, the test norm ∥ 𝑤∥

˜ 𝑌 ,𝑠 is obviously equivalent to ∥ 𝑤∥
˜ 𝑊˜ ℎ . The
fact that ∥(𝑤, 𝑟)∥
ˆ 𝑋,𝑠 is a norm follows from the next result. In these norms, the
ultraweak form
˜ = 𝑤, 𝐴∗ℎ 𝑤˜ 𝛺 + ⟨𝑟,

ˆ 𝑤)
𝑏((𝑤, 𝑟), ˆ 𝑤⟩
˜ 𝑊˜ ℎ
becomes a generalized duality pairing (of Definition 3.5), as shown next. Con-
sequently, the energy norm (of Definition 3.1) on 𝑋 for the ultraweak formulation
with the test norm ∥·∥𝑌 ,𝑠 is ∥(·, ·)∥ 𝑋,𝑠 , and simultaneously, the optimal test norm
(of Definition 3.5) on 𝑌 corresponding to the trial norm ∥(·, ·)∥ 𝑋,𝑠 is ∥·∥𝑌 ,𝑠 .
Theorem 7.9 (Optimal norms for ultraweak formulations). Adopt the setting
and assumptions of Theorem 7.6 and let 𝑋 = 𝐿 2 (𝛺)𝑚 × 𝑄 and 𝑌 = 𝑊˜ ℎ . Then, for
all (𝑣, 𝑣ˆ ) ∈ 𝑋 and 𝑤˜ ∈ 𝑌 ,
|𝑏((𝑣, 𝑣ˆ ), 𝑤)|
˜ |𝑏((𝑣, 𝑣ˆ ), 𝑤)|
˜
∥(𝑣, 𝑣ˆ )∥ 𝑋,𝑠 = sup , ∥ 𝑤∥
˜ 𝑌 ,𝑠 = sup . (7.43)
0≠𝑤˜ ∈𝑌 ∥ 𝑤∥
˜ 𝑌 ,𝑠 0≠(𝑣, 𝑣)∈𝑋
ˆ ∥(𝑣, 𝑣ˆ )∥ 𝑋,𝑠
In these norms, both ∥𝑏∥ and the inf-sup constant 𝛾 are one. The approximation
(𝑢 ℎ , 𝑞ˆ ℎ ) ∈ 𝑋ℎ from the ideal DPG method to the ultraweak solution (𝑢, 𝑞)
ˆ using
any 𝑋ℎ ⊂ 𝑋 is the best in the sense that
∥(𝑢 − 𝑢 ℎ , 𝑞ˆ − 𝑞ˆ ℎ )∥ 𝑋,𝑠 = inf ∥(𝑢 − 𝑤 ℎ , 𝑞ˆ − 𝑟ˆℎ )∥ 𝑋,𝑠 . (7.44)
(𝑤ℎ ,𝑟ˆℎ )∈𝑋ℎ

Proof. We need only prove the second equality in (7.43). The first equality of
(7.43) then follows from the second by Proposition 3.6, and moreover, (7.44) then
follows from Theorem 3.2(b).
Let 𝑤˜ ∈ 𝑊˜ ℎ . We will produce a (𝑤, 𝑟)
ˆ ∈ 𝑉 × 𝑄 satisfying
ˆ 𝑤)
𝑏((𝑤, 𝑟), ˜ 𝑌2 ,𝑠 ,
˜ = ∥ 𝑤∥ ∥(𝑤, 𝑟)∥
ˆ 𝑋,𝑠 ≤ ∥ 𝑤∥
˜ 𝑌 ,𝑠 . (7.45)
By virtue of (7.30), there is a 𝑧 ∈ 𝑉 such that
𝐴𝑧 = 𝑤.
˜ (7.46)
Then
˜ 𝑌2 ,𝑠 = (𝑠 −2 𝑤, ˜ 𝛺 + 𝐴∗ℎ 𝑤,
˜ 𝐴∗ℎ 𝑤˜

∥ 𝑤∥ ˜ 𝑤) 𝛺
= (𝐴(𝑠 −2 𝑧), 𝑤)
˜ 𝛺 + 𝐴∗ℎ 𝑤,
˜ 𝐴∗ℎ 𝑤˜ 𝛺

= 𝑠 −2 𝑧, 𝐴∗ℎ 𝑤˜ 𝛺 + ⟨𝐷 ℎ (𝑠 −2 𝑧), 𝑤⟩
˜ 𝑊˜ ℎ + 𝐴∗ℎ 𝑤,
˜ 𝐴∗ℎ 𝑤˜ 𝛺

= 𝑏((𝑤, 𝑟),
ˆ 𝑤)˜

with
𝑤 = 𝑠 −2 𝑧 + 𝐴∗ℎ 𝑤˜ ∈ 𝐿 2 (𝛺)𝑚 , 𝑟ˆ = 𝐷 ℎ (𝑠2 𝑧) ∈ 𝑄. (7.47)

Moreover,

ˆ 2𝑋,𝑠 =
∥(𝑤, 𝑟)∥ inf ∥𝑤 − 𝑣∥ 2𝛺 + 𝑠2 ∥ 𝐴𝑣∥ 2𝛺
−1 { 𝑞}
𝑣 ∈𝐷ℎ,𝑉 ˆ

≤ ∥𝑤 − (𝑠 −2 𝑧)∥ 2𝛺 + 𝑠2 ∥ 𝐴(𝑠 −2 𝑧)∥ 2𝛺

= ∥ 𝐴∗ℎ 𝑤∥
˜ 2𝛺 + 𝑠 −2 ∥ 𝑤∥
˜ 2𝛺 = ∥ 𝑤∥
˜ 𝑌2 ,𝑠 ,

where we have used the formulas for 𝑤 and 𝑧, from (7.47) and (7.46) respectively,
in the last step. This proves (7.45), from which it readily follows that
|𝑏((𝑣, 𝑣ˆ ), 𝑤)|
˜ |𝑏((𝑤, 𝑟),
ˆ 𝑤)|
˜
sup ≥ ≥ ∥ 𝑤∥
˜ 𝑌 ,𝑠 . (7.48)
0≠(𝑣, 𝑣)∈𝑋
ˆ ∥(𝑣, 𝑣ˆ )∥ 𝑋,𝑠 ∥(𝑤, 𝑟)∥
ˆ 𝑋,𝑠

In fact the supremum equals ∥ 𝑤∥˜ 𝑌 ,𝑠 because the reverse inequality also holds,
as we now show. Letting (𝑣, 𝑣ˆ ) be any element in 𝑋 and choosing any 𝑧 ∈ 𝑉 such
that 𝑣ˆ = 𝐷 ℎ 𝑧,

˜ = (𝑣, 𝐴∗ℎ 𝑤)
𝑏((𝑣, 𝑣ˆ ), 𝑤) ˜ 𝛺 + ⟨𝐷 ℎ 𝑧, 𝑤⟩
˜ 𝑊˜ ℎ
= (𝑣, 𝐴∗ℎ 𝑤) ˜ 𝛺 − (𝑧, 𝐴∗ℎ 𝑤)
˜ 𝛺 + (𝐴𝑧, 𝑤) ˜ 𝛺
∗
= (𝑣 − 𝑧, 𝐴ℎ 𝑤)
˜ 𝛺 + (𝐴𝑧, 𝑤)
˜ 𝛺
1/2
≤ ∥𝑣 − 𝑧∥ 2𝛺 + 𝑠2 ∥ 𝐴𝑧∥ 2𝛺 ∥ 𝑤∥
˜ 𝑌 ,𝑠 .
−1 { 𝑣ˆ }, we obtain
Taking the infimum over all 𝑧 ∈ 𝐷 ℎ,𝑉

˜ ≤ ∥(𝑣, 𝑣ˆ )∥ 𝑋,𝑠 ∥ 𝑤∥
𝑏((𝑣, 𝑣ˆ ), 𝑤) ˜ 𝑌 ,𝑠 ,

which together with (7.48) proves the second equality of (7.43).

The trial norm ∥ · ∥ 𝑋,𝑠 of Theorem 7.9 is related to Peetre’s 𝐾-functional (Bergh
and Löfström 1976). To see this, suppose there is a 𝐶𝑉 > 0 such that

∥𝑣∥ 𝛺 ≤ 𝐶𝑉 ∥ 𝐴𝑣∥ 𝛺 for all 𝑣 ∈ 𝑉 . (7.49)

Assumption (7.30) certainly implies the existence of such a 𝐶𝑉 (by the Closed
Range Theorem). Let 𝑉0 = {𝑤 ℎ ∈ 𝑊ℎ : 𝐷 ℎ 𝑤 ℎ = 0} be the kernel of 𝐷 ℎ , which is
a closed subspace by the continuity of 𝐷 ℎ . By Lemma 7.4, 𝑉0 is a subspace of 𝑉.
Hence, for any 𝑢ˆ ∈ 𝑄, the set 𝐷 ℎ,𝑉−1 { 𝑢}
ˆ equals the affine translate 𝑣 𝑢ˆ + 𝑉0 for any
𝑣 𝑢ˆ ∈ 𝑉 with the property 𝐷 ℎ 𝑣 𝑢ˆ = 𝑢.
ˆ Minimization over this closed coset gives a
minimal extension 𝐸 𝑢ˆ of 𝑢ˆ defined by

𝐸 𝑢ˆ = arg min ∥ 𝐴𝑣∥ 𝛺 .

−1 { 𝑢}
𝑣 ∈𝐷ℎ,𝑉 ˆ

−1 { 𝑢}
Note that by (7.49), ∥ 𝐴𝑣∥ 𝛺 is a norm on 𝐷 ℎ,𝑉 ˆ ⊂ 𝑉. Since 𝐸 is defined through
minimization over a translate of 𝑉0 , we see that
ˆ 𝐴𝑣 0 )𝛺 = 0
(𝐴𝐸 𝑢, for all 𝑣 0 ∈ 𝑉0 . (7.50)
Define the 𝐾-functional for the scale of spaces between 𝑉0 and 𝐿 2 (𝛺)𝑚 by

𝐾(𝑠, 𝑤) = inf ∥𝑤 − 𝑣 0 ∥ 2𝛺 + 𝑠2 ∥ 𝐴𝑣 0 ∥ 2𝛺 (7.51)
𝑣0 ∈𝑉0

for any 𝑤 ∈ 𝐿 2 (𝛺)𝑚 . The next two results help us better understand the norm
∥ · ∥ 𝑋,𝑠 in Theorem 7.9.
ˆ ∈ 𝑋,
Proposition 7.10. Suppose (7.49) holds. Then, for any (𝑢, 𝑢)
ˆ 2𝑋,𝑠 = 𝑠2 ∥ 𝐴𝐸 𝑢∥
∥(𝑢, 𝑢)∥ ˆ 2𝛺 + 𝐾(𝑠, 𝑢 − 𝐸 𝑢).
ˆ
−1 { 𝑢}
Proof. Writing any 𝑣 ∈ 𝐷 ℎ,𝑉 ˆ as 𝑣 = 𝐸 𝑢ˆ + 𝑣 0 for a 𝑣 0 ∈ 𝑉0 ,
∥𝑢 − 𝑣∥ 2𝛺 + 𝑠2 ∥ 𝐴𝑣∥ 2𝛺 = ∥𝑢 − 𝐸 𝑢ˆ − 𝑣 0 ∥ 2𝛺 + 𝑠2 ∥ 𝐴(𝐸 𝑢ˆ + 𝑣 0 )∥ 2𝛺
= ∥𝑢 − 𝐸 𝑢ˆ − 𝑣 0 ∥ 2𝛺 + 𝑠2 ∥ 𝐴𝐸 𝑢∥
ˆ 2𝛺 + 𝑠2 ∥ 𝐴𝑣 0 ∥ 2𝛺 ,
where the last equality is due to (7.50). Hence the result follows by minimizing
over 𝑣 0 ∈ 𝑉0 .
√︁
Proposition 7.11. Let 𝐶𝑉 be as in (7.49), 𝑐 𝑠 = 𝐶𝑉2 /𝑠2 , and 𝑘 𝑠 = 12 (𝑐 𝑠 + 𝑐2𝑠 + 4𝑐 𝑠 ).
ˆ ∈ 𝑋 and 𝑠 > 0, we have these two-sided bounds:
Then, for all (𝑢, 𝑢)
(1 + 𝑘 𝑠 )−1 ∥(𝑢, 𝑢)∥
ˆ 2𝑋,𝑠 ≤ ∥𝑢∥ 2𝛺 + 𝑠2 ∥ 𝐴𝐸 𝑢∥
ˆ 2𝛺 ≤ (1 + 𝑘 𝑠 ) ∥(𝑢, 𝑢)∥
ˆ 2𝑋,𝑠 . (7.52)
Proof. By the triangle inequality,
∥𝑢∥ 2𝛺 ≤ (∥𝑢 − 𝐸 𝑢ˆ − 𝑣 0 ∥ 𝛺 + ∥𝐸 𝑢ˆ + 𝑣 0 ∥ 𝛺 )2
≤ (1 + 𝛼 −2 )∥𝑢 − 𝐸 𝑢ˆ − 𝑣 0 ∥ 2𝛺 + (1 + 𝛼2 )𝐶𝑉2 ∥ 𝐴(𝐸 𝑢ˆ + 𝑣 0 )∥ 2𝛺 ,
where we have used (7.49) and the inequality (𝑎 + 𝑏)2 ≤ (1 + 𝛼 −2 )𝑎 2 + (1 + 𝛼2 )𝑏 2
for numbers 𝑎, 𝑏 and 𝛼 > 0. Using (7.50),
ˆ 2𝛺 ≤ (1 + 𝛼 −2 )∥𝑢 − 𝐸 𝑢ˆ − 𝑣 0 ∥ 2𝛺
∥𝑢∥ 2𝛺 + 𝑠2 ∥ 𝐴𝐸 𝑢∥

+ [(1 + 𝛼2 )𝑐 𝑠 + 1] 𝑠2 ∥ 𝐴𝐸 𝑢∥
ˆ 2𝛺 + 𝑠2 ∥ 𝐴𝑣 0 ∥ 2𝛺
√︁
with 𝑐 𝑠 = 𝐶𝑉2 /𝑠2 . Now set 𝛼2 = 12 (−𝑐 𝑠 + 𝑐2𝑠 + 4𝑐 𝑠 )/𝑐 𝑠 so that (1 + 𝛼2 )𝑐 𝑠 = 𝛼 −2 .
Then 1 + 𝛼 −2 = 1 + (1 + 𝛼2 )𝑐 𝑠 = 1 + 𝑘 𝑠 and the last inequality of (7.52) follows
after taking the infimum over all 𝑣 0 ∈ 𝑉0 and applying Proposition 7.10.
For the first inequality of (7.52), we begin by noting that the choice of 𝑣 0 = 0 in
(7.51) gives 𝐾(𝑠, 𝑤) ≤ ∥𝑤∥ 2𝛺 . Together with Proposition 7.10, we then have
ˆ 𝑋,𝑠 ≤ 𝑠2 ∥ 𝐴𝐸 𝑢∥
∥(𝑢, 𝑢)∥ ˆ 2𝛺 + ∥𝑢 − 𝐸 𝑢∥
ˆ 2𝛺 .
By the triangle inequality and (7.49),
ˆ 𝑋,𝑠 ≤ (1 + 𝛼 −2 )∥𝑢∥ 2𝛺 + [(1 + 𝛼 −2 )𝑐 𝑠 + 1]𝑠2 ∥ 𝐴𝐸 𝑢∥
∥(𝑢, 𝑢)∥ ˆ 2𝛺 .

Choosing exactly the same 𝛼 as before, 1 + 𝛼 −2 = 1 + (1 + 𝛼2 )𝑐 𝑠 = 1 + 𝑘 𝑠 , and the

first inequality of (7.52) is proved.
Note that when Proposition 7.11 is combined with (7.44) of Theorem 7.9, we
obtain
∥𝑢 − 𝑢 ℎ ∥ 2𝛺 + 𝑠2 ∥ 𝐴𝐸(𝑢ˆ − 𝑢ˆ ℎ )∥ 2𝛺

≤ (1 + 𝑘 𝑠 )2 inf ∥𝑢 − 𝑤 ℎ ∥ 2𝛺 + inf 𝑠2 ∥ 𝐴𝐸(𝑢ˆ − 𝑟ˆℎ )∥ 2𝛺 , (7.53)
𝑤ℎ ∈𝑋ℎ,0 𝑟ˆℎ ∈ 𝑋ˆ ℎ

where the constant 𝑘 𝑠 is as in Proposition 7.11. At the price of increasing the quasi-
optimality constant from the optimal one, this estimate gives a simpler implication
of (7.44) in easier norms.
Example 7.12 (Helmholtz equation for time-harmonic waves). The Helmholtz
equation arises in varied applications, including electromagnetics and acoustics.
For example, in the latter, the physics of acoustical disturbances (Courant and
Friedrichs 1948) show that by linearizing the isentropic Euler equations around a
hydrostatic solution and assuming harmonic time variations, we obtain
𝚤ˆ𝜔𝑣 + grad 𝜙 = 𝐺 in 𝛺, (7.54a)
𝚤ˆ𝜔𝜙 + div 𝑣 = 𝐹 in 𝛺, (7.54b)
for some given 𝜔 > 0, 𝐹 ∈ 𝐿 2 (𝛺) and 𝐺 ∈ 𝐿 2 (𝛺) 𝑁 . Here 𝑣 : 𝛺 → C 𝑁 and
𝜙 : 𝛺 → C are velocity and pressure variables, respectively, associated to the
acoustic perturbations from equilibrium, complexified under the standard time-
harmonic assumption. These equations must be supplemented by a boundary
condition. Let us consider the impedance boundary condition
𝑣·𝑛−𝜙=0 on 𝜕𝛺. (7.54c)
Other Dirichlet, Neumann or mixed-type boundary conditions can equally well be
considered. Note that taking the divergence of (7.54a) and substituting the value
of div 𝑣 from (7.54b), we recover the popular second-order form of the Helmholtz
equation for 𝜙 (which we shall not use here).
The first-order system (7.54) can be written as 𝐴𝑢 = 𝑓 using the group variable
𝑢 = (𝑣, 𝜙) ∈ 𝐿 2 (𝛺) 𝑁 × 𝐿 2 (𝛺), the unbounded operator
𝐴𝑢 = (ˆ𝚤 𝜔𝑣 + grad 𝜙, 𝚤ˆ𝜔𝜙 + div 𝑣)
and 𝑓 = (𝐺, 𝐹) ∈ 𝐿 2 (𝛺) 𝑁 × 𝐿 2 (𝛺). Clearly (7.54) is in the setting of (7.2) with
𝑚 = 𝑁 + 1 and dom 𝐴 equal to
𝑉 = {(𝑧, 𝜇) ∈ 𝐻(div, 𝛺) × 𝐻 1 (𝛺) : 𝑧 · 𝑛 = 𝜇 on 𝜕𝛺}.
Its adjoint is
𝐴∗ 𝑢˜ = (−ˆ𝚤 𝜔𝑣˜ − grad 𝜙,
˜ −ˆ𝚤 𝜔 𝜙˜ − div 𝑣˜ )

˜ in dom 𝐴∗ , which equals

for any 𝑢˜ = (𝑣˜ , 𝜙)
𝑉˜ = {(𝑧, 𝜇) ∈ 𝐻(div, 𝛺) × 𝐻 1 (𝛺) : 𝑧 · 𝑛 = −𝜇 on 𝜕𝛺},
a space analogous to 𝑉 but with a change of sign in the boundary condition. It is
easy to verify that (7.3), (7.4) and (7.10) hold. Using the standard trace theory of
𝐻(div, 𝛺) and 𝐻 1 (𝛺), it is also easy to verify that (7.14) hold.
To apply Theorems 7.6 and 7.9, it therefore suffices to verify the bijectivity
in (7.30). Injectivity follows from uniqueness of Helmholtz solutions, so (7.30)
follows from stability results of the form
∥𝑣∥ 𝛺 ≤ 𝐶(𝜔)∥ 𝐴𝑣∥ 𝛺 , 𝑣 ∈ 𝑉, (7.55)
which is the same as (7.49) with 𝐶𝑉 = 𝐶(𝜔). The inequality (7.55) was proved in
Demkowicz, Gopalakrishnan, Muga and Zitelli (2012b, Lemmas 4.2 and 4.3), using
a result of Melenk (1995), with a 𝐶(𝜔) independent of 𝜔 on a convex domain 𝛺 for
the present case of impedance boundary conditions. For other boundary conditions
or on trapping domains, we generally expect (7.55) to hold with an 𝜔-dependent
constant. Hence we proceed assuming that (7.55) holds, and consider the DPG
ultraweak formulation to find 𝑢 ∈ 𝐿 2 (𝛺) 𝑁 +1 and 𝑢ˆ ∈ 𝑄 = range(𝐷 ℎ,𝑉 ) satisfying
(𝑢, 𝐴∗ℎ 𝑤)
˜ 𝛺 + ⟨𝑢, ˜ 𝑊˜ ℎ = 𝐹(𝑤)
ˆ 𝑤⟩ ˜ for all 𝑤˜ ∈ 𝑊˜ ℎ , (7.56)
with the broken adjoint Helmholtz operator 𝐴∗ℎ . We conclude that this is a wellposed
formulation by Theorem 7.6.
Next let us apply Theorem 7.9 with 𝑠 = 1/𝛿 for some small 0 < 𝛿 ≪ 1 for an
ideal DPG approximation (𝑢 ℎ , 𝑢ˆ ℎ ) ∈ 𝑋0,ℎ × 𝑋ˆ ℎ ⊂ 𝑋ℎ of (7.56). Recall that the
combination of Proposition 7.11 and Theorem 7.9 yields (7.53) with 𝑘 𝑠 = 1 + 𝑐𝛿
for a constant 𝑐 > 0 that depends only on 𝐶(𝜔). Then (7.53) implies
1
∥𝑢 − 𝑢 ℎ ∥ 2𝛺 + ∥ 𝐴𝐸(𝑢ˆ − 𝑢ˆ ℎ )∥ 2𝛺
𝛿2
2 1
≤ (1 + 𝑐𝛿) inf ∥𝑢 − 𝑤 ℎ ∥ 2𝛺 2
+ inf 2 ∥ 𝐴𝐸(𝑢ˆ − 𝑟ˆℎ )∥ 𝛺 .
𝑤ℎ ∈𝑋ℎ,0 𝑟ˆℎ ∈ 𝑋ˆ ℎ 𝛿

This estimate was arrived at by other means (without using Theorem 7.9) in Gopala-
krishnan, Muga and Olivares (2014). There it was offered as a justification for the
practically visible marked improvement in DPG Helmholtz solutions as 𝛿 is made
smaller. The improvement in solutions was further justified there via a dispersion
analysis of the DPG method for the Helmholtz equation on an infinite uniform
stencil. Since the test norm
∥ 𝑤∥ ˜ 2𝛺 + ∥ 𝐴∗ℎ 𝑤∥
˜ 𝑌2 ,1/ 𝛿 = 𝛿2 ∥ 𝑤∥ ˜ 2𝛺
becomes smaller as 𝛿 is made smaller, a takeaway from such observations is that it
pays to use a weaker norm on the test space when computing wave solutions.

8. Optimal test functions in time integrators

In this section we present an application of the DPG ideas to design an exponential
integrator for initial value problems. The resulting method yields a discrete solution
not only at the time steps but also between the time steps. In fact, in each time
interval, the discrete solution is the best (in 𝐿 2 -norm) possible approximation of
the exact solution from a polynomial space. We also show how the DPG error
representation can be used for a posteriori error control within the time integrators.

8.1. An initial value system

Let 𝐾 ∈ C𝑚×𝑚 be a non-singular matrix and 𝛺 = (0, 1) ⊂ R. Given 𝑢 0 ∈ C𝑚 and
𝑓 ∈ 𝐿 2 (𝛺), consider the initial value problem for 𝑢 : 𝛺 → C𝑚 satisfying
𝑑𝑢
+ 𝐾𝑢 = 𝑓 , 0 < 𝑡 < 1,
𝑑𝑡 (8.1)
𝑢(0) = 𝑢 0 .
This can be viewed as a generalization of Example 2.6. We may proceed similarly
to get the weak problem to find 𝑢 ∈ 𝐿 2 (𝛺)𝑚 and 𝑢ˆ ∈ C𝑚 satisfying
(𝑢, 𝐴∗ 𝑣)𝛺 + 𝑢ˆ 𝑣(1) = ( 𝑓 , 𝑣)𝛺 + 𝑢 0 𝑣(0) for all 𝑣 ∈ 𝐻 1 (𝛺)𝑚 , (8.2)
with
𝑑𝑣
𝐴∗ 𝑣 = − + 𝐾 ∗ 𝑣.
𝑑𝑡
This fits into our framework with
ˆ 𝑦) = (𝑤, 𝐴∗ 𝑦)𝛺 + 𝑤ˆ 𝑦(1),
𝑏((𝑤, 𝑤), (8.3a)
ℓ(𝑣) = ( 𝑓 , 𝑣)𝛺 + 𝑢 0 𝑣(0). (8.3b)
An alternative avenue to arrive at the same weak formulation is the approach
of Section 7, that is, one would set 𝛺 = (0, 1), an unbounded operator 𝐴𝑢 =
𝑑𝑢/𝑑𝑡 + 𝐾𝑢, on 𝐿 2 (𝛺)𝑚 with dom(𝐴) = {𝑢 ∈ 𝐻 1 (𝛺)𝑚 : 𝑢(0) = 0}, and develop
the following formulation for homogeneous initial conditions: (𝑢, 𝐴∗ 𝑣) = ( 𝑓 , 𝑣) for
all 𝑣 ∈ 𝑉 ∗ = dom 𝐴∗ = {𝑢 ∈ 𝐻 1 (𝛺)𝑚 : 𝑢(1) = 0}. One would then extend it to
cover the non-homogeneous initial condition 𝑢 0 by a process similar to going from
unbroken to broken graph spaces, employing a larger class of test functions that do
not vanish at 𝑡 = 1. This would then result in the additional unknown, the trace 𝑢, ˆ
and one would obtain (8.2) again. Of course, for regular solutions, 𝑢ˆ = 𝑢(1).
Returning to (8.3), we endow the trial space 𝑋 = 𝐿 2 (𝛺)𝑚 × C𝑚 with the norm
ˆ 2𝑋 = ∥𝑤∥ 2𝛺 + | 𝑤|
∥(𝑤, 𝑤)∥ ˆ 22 ,
where | 𝑤|
ˆ 22 = | 𝑤ˆ 1 | 2 + · · · + | 𝑤ˆ 𝑚 | 2 denotes the square of the ℓ 2 -norm of 𝑤ˆ ∈ C𝑚 . On
the test space 𝑌 = 𝐻 1 (𝛺)𝑚 , the corresponding optimal test norm of (3.6) can then

be computed easily. Namely, with 𝑏 as in (8.3), we see that

|𝑏((𝑤, 𝑤),
ˆ 𝑦)|
= ∥ 𝐴∗ 𝑦∥ 2𝛺 + |𝑦(1)| 22
1/2
||||𝑦||||𝑌 = sup . (8.4a)
0≠(𝑤, 𝑤)∈𝑋
ˆ ∥(𝑤, 𝑤)∥
ˆ 𝑋

We then set the norm on 𝑌 = 𝐻 1 (𝛺)𝑚 to be the optimal test norm, that is,
∥𝑦∥𝑌 = ||||𝑦||||𝑌 . (8.4b)
Clearly, for any 𝑣, 𝑦 ∈ 𝑌 ,
(𝑣, 𝑦)𝑌 = (𝐴∗ 𝑣, 𝐴∗ 𝑦)𝛺 + 𝑣(1) · 𝑦(1)
= (−𝑣¤ + 𝐾 ∗ 𝑣, − 𝑦¤ + 𝐾 ∗ 𝑦)𝛺 + 𝑣(1) · 𝑦(1) (8.4c)
is the inner product that generates the optimal test norm above. Here 𝑣¤ = 𝑑𝑣/𝑑𝑡.
This is one of the rare cases where the optimal test norm is so readily calculable.
The ideal Petrov Galerkin method (i.e. the IPG method of Definition 2.3) uses the
optimal test space, which we now examine. Using the inner product in (8.4c), the
ˆ ∈𝑋
variational problem for the optimal test function 𝑣 corresponding to any (𝑤, 𝑤)
reads as follows for any 𝑦 ∈ 𝐻 1 (𝛺)𝑚 :
(−𝑣¤ + 𝐾 ∗ 𝑣, − 𝑦¤ + 𝐾 ∗ 𝑦)𝛺 + 𝑣(1) · 𝑦(1) = (𝑤, − 𝑦¤ + 𝐾 ∗ 𝑦)𝛺 + 𝑤ˆ 𝑦(1). (8.5)
For any 𝑔 ∈ 𝐿 2 (𝛺), since the initial value problem 𝑦¤ − 𝐾 ∗ 𝑦 = −𝑔 with initial
condition 𝑦(1) = 0 is solvable, we find that (8.5) implies (−𝑣¤ + 𝐾 ∗ 𝑣, 𝑔)𝛺 = (𝑤, 𝑔)𝛺
for all 𝑔 in 𝐿 2 (𝛺), i.e. −𝑣¤ + 𝐾 ∗ 𝑣 = 𝑤. Using this in (8.5), we then conclude that
𝑣(1) = 𝑤.ˆ Thus the optimal test function 𝑣 of any (𝑤, 𝑤) ˆ ∈ 𝑋 is the solution of
𝑑𝑣
− + 𝐾𝑣 = 𝑤 in 𝛺, (8.6a)
𝑑𝑡
𝑣(1) = 𝑤.
ˆ (8.6b)
Recall the matrix exponential, defined by
∞
∑︁ 1 𝑘
𝑒𝐴 = 𝐴 . (8.7)
𝑘=0
𝑘!

Using it, the solution of (8.6) can be written down in closed form by the variation of
constants method. Namely, the optimal test function 𝑣 and the trial-to-test operator
𝑇 are given by
∫ 𝑡
𝐾 ∗ (𝑡 −1) 𝐾 ∗𝑡 ∗
𝑣(𝑡) = 𝑇(𝑤, 𝑤)
ˆ =𝑒 𝑤ˆ + 𝑒 𝑒 −𝐾 𝜏 𝑤(𝜏) 𝑑𝜏. (8.8)
1

Throughout this section, we fix 𝑇 to be this operator. Because we have chosen

the optimal test norm, we are now able to prove that the resulting Petrov–Galerkin
method produces the best possible 𝐿 2 -approximation within (0, 1), and furthermore,
the numerical flux approximation at the endpoint 𝑡 = 1 has zero error.

Proposition 8.1 (Optimality of interior solution and endpoint exactness). Let

𝑈ℎ be any finite-dimensional subspace of 𝐿 2 (0, 1)𝑚 and let the interior solu-
tion 𝑢 ℎ ∈ 𝑈ℎ together with the endpoint solution 𝑢ˆ ℎ ∈ C𝑚 satisfy
(𝑢 ℎ , 𝐴∗ 𝑣)𝛺 + 𝑢ˆ ℎ 𝑣(1) = ( 𝑓 , 𝑣)𝛺 + 𝑢 0 𝑣(0)
opt
for all 𝑣 ∈ 𝑌ℎ , (8.9)
opt
where 𝑌ℎ = 𝑇(𝑈ℎ × C𝑚 ) for the 𝑇 given by (8.8). Let 𝑢 be the exact solution of
(8.1). Then
𝑢 ℎ = 𝛱𝑈 𝑢 and 𝑢ˆ ℎ = 𝑢ˆ = 𝑢(1), (8.10)
where 𝛱𝑈 is the 𝐿 2 -orthogonal projection into 𝑈ℎ .
Proof. The norm choice in (8.4b) makes the form 𝑏((𝑤, 𝑤), ˆ 𝑦) into a generalized
duality pairing (as in Definition 3.5), so by Proposition 3.6, the energy norm is the
same as the ∥ · ∥ 𝑋 -norm. Hence, by Theorem 3.2(b), solution of the IPG method for
this formulation equals the best approximation, that is, the given 𝑢 ℎ and 𝑢ˆ ℎ satisfy

∥𝑢 − 𝑢 ℎ ∥ 2𝛺 + | 𝑢ˆ − 𝑢ˆ ℎ | 22 = inf ∥𝑢 − 𝑤 ℎ ∥ 2
𝛺 + | ˆ
𝑢 − ˆ
𝑤 ℎ | 2
2 .
𝑚 𝑤ℎ ∈𝑈ℎ 𝑤ˆ ℎ ∈C

It is easy to see that the infimum equals ∥𝑢 − 𝛱𝑈 𝑢∥ 𝛺 . Hence the identities of (8.10)
follow.

8.2. The discrete system

Consider the basis for the set of vector polynomials 𝑃 𝑝 (𝛺)𝑚 given by monomials
𝑡 𝑗 𝑒 𝑖 for 𝑡 ∈ 𝛺 = (0, 1), 𝑗 = 0, . . . , 𝑝, and 𝑖 = 1, . . . , 𝑚 (where 𝑒 𝑖 are the standard
unit vectors). Let us set 𝑈ℎ in Proposition 8.1 by
𝑈ℎ = 𝑃 𝑝 (𝛺)𝑚 = span{𝑡 𝑗 𝑒 𝑖 : 𝑗 = 0, . . . , 𝑝, 𝑖 = 1, . . . , 𝑚}
and examine how to solve for 𝑢 ℎ and 𝑢ˆ ℎ in (8.9). Then we introduce the follow-
ing functions that emerge from the previous formula in (8.8) for the trial-to-test
operator:
∗ (𝑡 −1)
𝑣ˆ 𝑖 ≔ 𝑇(0, 𝑒 𝑖 ) = 𝑒 𝐾 𝑒𝑖
∗
𝑣 0,𝑖 ≔ 𝑇(𝑒 𝑖 , 0) = 𝐾 −∗ [𝐼 − 𝑒 𝐾 (𝑡 −1) ]𝑒 𝑖
∫ 1
𝐾 ∗𝑡 ∗
𝑝
𝑣 𝑝,𝑖 ≔ 𝑇(𝑡 𝑒 𝑖 , 0) = 𝑒 𝑒 −𝐾 𝜏 𝜏 𝑝 𝑒 𝑖 𝑑𝜏
𝑡
−∗ 𝑝
= 𝐾 (𝑡 𝑒 𝑖 + 𝑝𝑣 𝑝−1,𝑖 − 𝑣ˆ 𝑖 )
for 𝑝 = 1, 2, . . . , where we have integrated by parts to get the last identity. Given
any 𝑀 ∈ C𝑚×𝑚 and 𝑡 > 0, define the matrix-valued functions
𝑝
∑︁ 𝑝!
𝑅 𝑝 (𝑀, 𝑡) ≔ (𝑀𝑡) 𝑗 and 𝑣ˆ (𝑀, 𝑡) ≔ 𝑒 𝑀(𝑡 −1) .
𝑗=0
𝑗!

Then let
𝑣 𝑟 (𝑀, 𝑡) = 𝑀 −𝑟 −1 (𝑅𝑟 (𝑀, 𝑡) − 𝑅𝑟 (𝑀, 1) 𝑣ˆ (𝑀, 𝑡)). (8.11)
Using this notation, we can express the previously given optimal test functions as
𝑣 𝑟 ,𝑖 = 𝑣 𝑟 (𝐾 ∗ , 𝑡)𝑒 𝑖 for all 𝑟 = 0, 1, . . . , 𝑝.
When these optimal test function expressions are substituted into (8.9), we obtain
a system for the discrete solution
𝑝
∑︁
𝑢ℎ = 𝑢 ℎ, 𝑗 𝑡 𝑗 , 𝑢 ℎ, 𝑗 ∈ C𝑚 , (8.12a)
𝑗=0

which couples the solution coefficients 𝑢 ℎ, 𝑗 by

𝑝
∑︁ ∫ 1
𝑎𝑟 𝑗 𝑢 ℎ, 𝑗 = 𝑣 𝑟 (𝐾, 0) 𝑢 0 + 𝑣 𝑟 (𝐾, 𝑡) 𝑓 (𝜏) 𝑑𝑡, 𝑟 = 0, . . . , 𝑝, (8.12b)
𝑗=0 0

where
∫ 1
𝑎𝑟 𝑗 ≔ 𝑡 𝑗+𝑟 𝑑𝑡.
0
This is a system of 𝑝 + 1 equations for the (vector-valued) unknowns 𝑢 ℎ, 𝑗 , 𝑗 =
0, . . . , 𝑝. The endpoint trace, which equals the exact solution by Proposition 8.1,
is given by
∫ 1
𝑢ˆ ℎ = 𝑣ˆ (𝐾, 0) 𝑢 0 + 𝑣ˆ (𝐾, 𝑡) 𝑓 (𝜏) 𝑑𝜏. (8.12c)
0
Thus, to compute 𝑢ˆ ℎ and 𝑢 ℎ, 𝑗 , we need techniques to compute the integrals in-
volving matrix exponentials.

8.3. Exponential quadrature rules

To proceed, as seen above, we must digress to review standard exponential integ-
rators, which, for the system (8.1), are based on the formula for the exact solution
obtained by the method of variation of constants, namely
∫ 𝑡
𝑢(𝑡) = 𝑒 −𝐾𝑡 𝑢 0 + 𝑒 −𝐾𝑡 𝑒 𝐾 𝜏 𝑓 (𝜏) 𝑑𝜏. (8.13)
0

Applying the formula recursively to intervals [𝑡 𝑘−1 , 𝑡 𝑘 ], we have

∫ 𝑡𝑘
𝑢(𝑡 𝑘 ) = 𝑒 −ℎ𝑘 𝐾 𝑢(𝑡 𝑘−1 ) + 𝑒 (𝜏−𝑡𝑘 )𝐾 𝑓 (𝜏) 𝑑𝜏, ℎ 𝑘 ≔ 𝑡 𝑘 − 𝑡 𝑘−1 .
𝑡𝑘−1

The integral above can be approximated using standard exponential quadrature

rules that we now describe.

Selecting 𝑠 arbitrary quadrature points 𝑐 𝑖 ∈ [0, 1], 𝑖 = 1, . . . , 𝑠, we approximate

the right-hand side function 𝑓 (𝑠) in the time interval [0, 1] by
𝑠
∑︁
𝑓 (𝜏) ≈ 𝑓 (𝑡 𝑘−1 + 𝑐 𝑖 ℎ 𝑘 ) 𝑙˜𝑖 (𝜏),
𝑖=1

where 𝑙𝑖 , 𝑖 = 1, . . . , 𝑠 are the Lagrange polynomials of order 𝑠 − 1 on the unit

interval 𝐼 = [0, 1]
𝑠
Ö 𝜃 − 𝑐𝑗
𝑙𝑖 (𝜃) = , 𝑗 = 1, . . . , 𝑠,
𝑗=1, 𝑗≠𝑖
𝑐𝑖 − 𝑐 𝑗

and 𝑙˜𝑖 (𝜏) are the corresponding mapped Lagrange polynomials on interval [𝑡 𝑘−1 , 𝑡 𝑘 ].
Substituting the approximation for 𝑓 (𝜏) into the formula (8.13) from variation of
constants, we obtain a time-marching scheme,
𝑠
∑︁
𝑢 𝑘 = 𝑒 −ℎ𝑘 𝐾 𝑢 𝑘−1 + ℎ 𝑘 𝑏 𝑖 (−ℎ 𝑘 𝐾) 𝑓𝑖 ,
𝑖=1

where 𝑢 𝑘 ≈ 𝑢(𝑡 𝑘 ), 𝑓𝑖 = 𝑓 (𝑡 𝑘−1 + 𝑐 𝑖 ℎ 𝑘 ), and the weights are defined by

∫ 1
𝑏 𝑖 (𝑧) ≔ 𝑒 (1− 𝜃)𝑧 𝑙𝑖 (𝜃) 𝑑𝜃, 𝑖 = 1, . . . , 𝑠.
0

It is standard to compute the weights using the so-called ‘𝜙-functions’ (see e.g.
Al-Mohy and Higham 2011 or Niesen and Wright 2012), defined as follows:

𝜙0 (𝑧) ≔ 𝑒 𝑧 ,
∫ 1
𝜃 𝑝−1
𝜙 𝑝 (𝑧) ≔ 𝑒 (1− 𝜃)𝑧 𝑑𝜃
0 (𝑝 − 1)!

1 1
= 𝜙 𝑝−1 (𝑧) − , 𝑝 = 1, 2, . . . .
𝑧 (𝑝 − 1)!

The two simple examples below show how they are used.

Example 8.2 (A standard one-point integrator). Selecting a single point 𝑐 1 ∈

[0, 1], we have 𝑙1 (𝜃) = 1, 𝑏 1 (𝑧) = 𝜙1 (𝑧), 𝑒 𝑧 = 𝑧𝜙1 (𝑧) + 1, which gives

𝑢 𝑘 = 𝑢 𝑘−1 + ℎ 𝑘 𝜙1 (−ℎ 𝑘 𝐾)( 𝑓1 − 𝐾𝑢 𝑘−1 ),

an integrator formula for the 𝑠 = 1 case.

Example 8.3 (A standard two-point integrator). Selecting 𝑐 1 , 𝑐 2 ∈ [0, 1], we

have
∫ 1
𝜃 − 𝑐2
𝑏 1 (𝑧) = 𝑒 (1− 𝜃)𝑧 𝑑𝜃
0 𝑐1 − 𝑐2
∫ 1 ∫ 1
1 (1− 𝜃)𝑧 𝑐2
= 𝑒 𝜃 𝑑𝜃 − 𝑒 (1− 𝜃)𝑧 𝑑𝜃
𝑐1 − 𝑐2 0 𝑐1 − 𝑐2 0
1 𝑐2
= 𝜙2 (𝑧) − 𝜙1 (𝑧).
𝑐1 − 𝑐2 𝑐2 − 𝑐1

Similarly,

1 𝑐1
𝑏 2 (𝑧) = 𝜙2 (𝑧) − 𝜙1 (𝑧).
𝑐2 − 𝑐1 𝑐2 − 𝑐1
Thus we obtain

𝑢 𝑘 = 𝑢 𝑘−1 − ℎ 𝑘 𝐾 𝜙1 (−ℎ 𝑘 𝐾)𝑢 𝑘−1

1 𝑐2
+ ℎ𝑘 𝜙2 (−ℎ 𝑘 𝐾) − 𝜙1 (−ℎ 𝑘 𝐾) 𝑓1
𝑐1 − 𝑐2 𝑐1 − 𝑐2

1 𝑐1
+ ℎ𝑘 𝜙2 (−ℎ 𝑘 𝐾) − 𝜙1 (−ℎ 𝑘 𝐾) 𝑓2 ,
𝑐2 − 𝑐1 𝑐2 − 𝑐1
a standard two-point exponential integrator.

To connect these existing results to the IPG scheme, first note that (8.12c) is
exactly the variation of constants formula (8.13) (which is also as expected from
the endpoint exactness result of Proposition 8.1). Hence the above-described
standard exponential integrator formulas can be used to compute the IPG fluxes
𝑢ˆ ℎ at the time steps 𝑡 𝑘 . It remains to discuss how to compute the solution 𝑢 ℎ in
between.

8.4. An exponential integrator for interior solution in between time steps

Going beyond the classical exponential integration schemes, we now discuss a
new feature arising from the DPG method, namely the capability to also compute
an interior solution field that represents the 𝐿 2 -projection of the solution onto
the polynomial spaces within the intervals [𝑡 𝑘−1 , 𝑡 𝑘 ]. To this end, we obtain a
discrete scheme from the system (8.12b) using the following result. Its proof is an
elementary but lengthy calculation which can be found in Muñoz-Matute, Pardo
and Demkowicz (2021), and is omitted here. The result allows us to compute the
optimal test functions using the standard 𝜙 functions.

Proposition 8.4. The following relations between optimal test functions (8.11)
and the 𝜙 functions hold:
𝑟
∑︁ 𝑟!
𝑣 𝑟 (𝑀, 0) = (−1)𝑟 − 𝑗 𝜙𝑟 − 𝑗+1 (−𝑀),
𝑗=0
𝑗!
∫ 1 𝑟 (8.14)
∑︁ 𝑟!
𝑣 𝑟 (𝑀, 𝑡) 𝑡 𝑞 𝑑𝑡 = 𝑞! (−1)𝑟 − 𝑗 𝜙𝑟 − 𝑗+𝑞+2 (−𝑀)
0 𝑗=0
𝑗!

for any 𝑀 in C𝑚×𝑚 (including the scalar case 𝑚 = 1).

Example 8.5 (A one-point integrator for the interior IPG solution). Utilizing
Proposition 8.4, the system (8.12b) for 𝑝 = 0 reduces to the following scheme:
𝑘
𝑢 ℎ,0 = 𝜙1 (−ℎ 𝑘 𝐾)𝑢ˆ ℎ𝑘−1 + ℎ 𝑘 𝜙2 (−ℎ 𝑘 𝐾) 𝑓1 .

It computes a constant interior solution given from 𝑢ˆ ℎ𝑘−1 and 𝑓1 that is guaranteed
to equal the mean of the exact solution.

Example 8.6 (A two-point integrator for the interior IPG solution). For 𝑝 =
2, the system (8.12b) for the two coefficients of the interior solution reduces to the
following system of two equations after applying Proposition 8.4:
𝑘 1 𝑘
𝑢 ℎ,0 + 𝑢 ℎ,1 = 𝑔1 𝑢ˆ ℎ𝑘−1 , 𝑓1 , 𝑓2 , (8.15a)
2
1 𝑘 1 𝑘
𝑢 ℎ,0 + 𝑢 ℎ,1 = 𝑔2 𝑢ˆ ℎ𝑘−1 , 𝑓1 , 𝑓2 , (8.15b)
2 3
where

𝑔1 𝑢ˆ ℎ𝑘−1 , 𝑓1 , 𝑓2 = 𝜙1 (−ℎ 𝑘 𝐾)𝑢ˆ ℎ𝑘−1

1 𝑐2
+ ℎ𝑘 𝜙3 (−ℎ 𝑘 𝐾) − 𝜙2 (−ℎ 𝑘 𝐾) 𝑓1
𝑐1 − 𝑐2 𝑐1 − 𝑐2

1 𝑐1
+ ℎ𝑘 𝜙3 (−ℎ 𝑘 𝐾) − 𝜙2 (−ℎ 𝑘 𝐾) 𝑓2 ,
𝑐2 − 𝑐1 𝑐2 − 𝑐1
and

𝑔2 𝑢ˆ ℎ𝑘−1 , 𝑓1 , 𝑓2 = 𝜙1 (−ℎ 𝑘 𝐾)𝑢ˆ ℎ𝑘−1 − 𝜙2 (−ℎ 𝑘 𝐾)𝑢ˆ ℎ𝑘−1

1 𝑐2
+ ℎ𝑘 (𝜙3 (−ℎ 𝑘 𝐾) − 𝜙4 (−ℎ 𝑘 𝐾)) − (𝜙2 (−ℎ 𝑘 𝐾) − 𝜙3 (−ℎ 𝑘 𝐾)) 𝑓1
𝑐1 − 𝑐2 𝑐1 − 𝑐2

1 𝑐1
+ ℎ𝑘 (𝜙3 (−ℎ 𝑘 𝐾) − 𝜙4 (−ℎ 𝑘 𝐾)) − (𝜙2 (−ℎ 𝑘 𝐾) − 𝜙3 (−ℎ 𝑘 𝐾)) 𝑓2 .
𝑐2 − 𝑐1 𝑐2 − 𝑐1
The equations of (8.15) give the best 𝐿 2 -approximation of the interior solution in
the space of linear functions.

8.5. A posteriori error estimation

Now that we have an interior solution, it is possible to get an error representation
through the DPG residual. Indeed, equation (3.4a) for the error representation 𝜀
now takes the form
(𝜀, 𝑣)𝑌 = ℓ(𝑣) − 𝑏((𝑢 ℎ , 𝑢ˆ ℎ ), 𝑣)
= ( 𝑓 , 𝑣)𝛺 + 𝑢 0 𝑣(0) − (𝑢 ℎ , 𝐴∗ 𝑣)𝛺 + 𝑢ˆ ℎ 𝑣(1)

for all 𝑣 ∈ 𝑌 . For this example, it is possible to derive an explicit formula for
function 𝜀, as shown in Muñoz-Matute, Demkowicz and Pardo (2022). However,
applying the formula requires coming up with special quadrature rules for matrix-
valued functions and it is cumbersome to use. Instead, it is recommended to
compute an inexact error representation 𝜀𝑟 using (6.7a), namely
(𝜀𝑟 , 𝑣)𝑌 = ℓ(𝑣) − 𝑏((𝑢 ℎ , 𝑢ˆ ℎ ), 𝑣) for all 𝑣 ∈ 𝑌 𝑟 , (8.16)
opt
with a 𝑌𝑟obtained by enlarging 𝑌ℎ by at least one linearly independent function.
opt
(One may, for example, set 𝑌 𝑟 = 𝑇(𝑃 𝑝+1 (𝛺) × C𝑚 ) ⊃ 𝑌ℎ .) Solving for 𝜀𝑟 from
(8.16) then only involves solving a small linear system after the computation of
𝑢 ℎ and 𝑢ˆ ℎ . Moreover, 𝜀𝑟 is almost as good an error estimator as 𝜀 because of the
following result.
Proposition 8.7 (Error estimator for time integrator). Set 𝑏 and ℓ as in (8.3),
opt
𝑋ℎ = 𝑃 𝑝 (𝛺) × C𝑚 , 𝛺 = (0, 1), and using any 𝑌 𝑟 ⊃ 𝑌ℎ , solve the practical DPG
method (6.7) for 𝑥 ℎ ∈ 𝑋ℎ and 𝜀 ∈ 𝑌 . Then 𝑥 ℎ coincides with the solution 𝑢 ℎ , 𝑢ˆ ℎ
𝑟 𝑟

of the IPG method (8.9). Moreover,

∥𝜀𝑟 ∥𝑌2 ≤ ∥𝜀∥𝑌2 ≤ ∥𝜀𝑟 ∥𝑌2 + osc(ℓ)2 , (8.17)
∥𝜀𝑟 ∥𝑌 ≤ ∥𝑢 − 𝑢 ℎ ∥ 𝛺 ≤ ∥𝜀𝑟 ∥𝑌 + osc(ℓ), (8.18)
where osc(ℓ) = ∥ℓ ◦ (𝐼 − 𝑃𝑌 𝑟 )∥𝑌 ∗ and 𝑃𝑌 𝑟 denotes the 𝑌 -orthogonal projection
onto 𝑌 𝑟 .
Proof. The stated results follow from discussions in Examples 5.3 and 6.5. Es-
timate (8.17) is immediate from (6.12). Furthermore, since the norm choice in
(8.4b) makes 𝑏((𝑤, 𝑤),
ˆ 𝑦) into a generalized duality pairing, by Proposition 3.6, we
know that ∥𝑏∥ = 1 and 𝛾 = 1. Hence (6.9) implies
∥𝑥 − 𝑥 ℎ ∥ 𝑋 ≤ ∥𝜀𝑟 ∥𝑌 + osc(ℓ), ∥𝜀𝑟 ∥𝑌 ≤ ∥𝑥 − 𝑥 ℎ ∥ 𝑋 .
By the endpoint exactness of Proposition 8.1, ∥𝑥 − 𝑥 ℎ ∥ 𝑋 = ∥𝑢 − 𝑢 ℎ ∥ 𝛺 , so (8.18)
also follows.
Adapting Proposition 8.7 to each time interval [𝑡 𝑘 , 𝑡 𝑘+1 ], we obtain a practical
strategy for adaptive step size control. The unit constants in (8.17)–(8.18) are
notable and point to the effectiveness of the strategy. Note, however, that we have
not stated any guarantee for osc(ℓ) to be small. We would need to ensure that 𝑌 𝑟

contains enough functions to provide some approximation properties before we can

quantitatively characterize the smallness of osc(ℓ).
Bibliographical notes. The main ideas of this section are taken from Muñoz-Matute
et al. (2021) and Muñoz-Matute et al. (2022). Our presentation here is slightly
different and shorter. The DPG exponential integrator has also been recently
extended to nonlinear problems in Muñoz-Matute and Demkowicz (2024).

9. Duality in DPG formulations

This section is devoted to formulations that are dual in a certain sense to the hybrid
DPG formulations. We motivate the construction of the dual formulation using
overdetermined and underdetermined systems, and provide typical applications of
the dual problem, including the Aubin–Nitsche duality argument for estimating
error in weaker norms, and error bounds for goal functionals. In the DPG context,
the regularity of dual solutions can be a limiting factor. Even when all solutions of
the DPG formulation are highly regular, the dual solutions may have very limited
regularity.

9.1. Overdetermined and underdetermined equations

We have been occupied with the solution of the operator equation
𝐵𝑥 = ℓ, (9.1)
given ℓ ∈ 𝑌 ∗ and given 𝐵 : 𝑋 → 𝑌 ∗ , the operator generated by the form 𝑏(·, ·)
introduced and used in Section 3 (see e.g. (3.10)). Also using the adjoint 𝐵∗ and
the Riesz operators 𝑅 𝑋 and 𝑅𝑌 introduced there, consider the following two systems
of operator equations. The first seeks 𝑥 ∈ 𝑋 and 𝜁 ∈ 𝑌 solving
𝑅𝑌 𝜁 + 𝐵𝑥 = ℓ,
(9.2)
𝐵∗ 𝜁 = 0.
The second seeks 𝑥 ∈ 𝑋 and 𝜆 ∈ 𝑌 solving
𝑅 𝑋 𝑥 + 𝐵∗ 𝜆 = 0,
(9.3)
𝐵𝑥 = ℓ.
The system (9.3) is related to (9.1) since its second equation is identical to (9.1).
The system (9.2) is also related to (9.1), since whenever 𝑥 solves (9.1), it also solves
(9.2) with 𝜁 = 0. Let us begin by studying in what sense these formulations are
twin relatives of the same problem (9.1).
Suppose the inf-sup condition (1.2a) holds, but we do not know if the uniqueness
condition (1.2b) holds. The inf-sup condition (1.2a) is the same as
∥𝐵𝑧∥𝑌 ∗ ≥ 𝛾∥𝑧∥ 𝑋 for all 𝑧 ∈ 𝑋, (9.4)
which is also equivalent to asserting that 𝐵 is injective and that the range of 𝐵 is

closed. But we do not know if 𝐵 is surjective. Therefore we can only expect 𝐵𝑥 = ℓ

to be solvable if ℓ ∈ range(𝐵). Since range 𝐵 equals the annihilator of the null
space of 𝐵∗ , a necessary compatibility condition for solvability of 𝐵𝑥 = ℓ is that
ℓ(𝑦) = 0 for all 𝑦 ∈ ker(𝐵∗ ). (9.5)
For general ℓ, the equation 𝐵𝑥 = ℓ represents an overdetermined system.
Nonetheless, the inf-sup condition (9.4) immediately implies that (9.2) is uniquely
solvable, by the standard theory of mixed methods; see e.g. Brezzi and Fortin (1991)
or Ern and Guermond (2021). Since (9.2) uniquely solves for 𝑥 even when 𝐵𝑥 = ℓ
is not solvable, we may interpret (9.2) as a regularized version of 𝐵𝑥 = ℓ. Indeed,
(9.2) solves for 𝑥 satisfying
𝐵∗ 𝑅𝑌−1 𝐵𝑥 = 𝐵∗ 𝑅𝑌−1 ℓ, (9.6)
as can be seen by eliminating 𝜁 from (9.2). When (9.4) holds, (9.6) can be solved
for 𝑥 even when 𝐵𝑥 = ℓ cannot be solved.
Now suppose the adjoint inf-sup condition (1.3a) holds, but we do not know if
the adjoint uniqueness condition (1.3b) holds. Note that (1.3a) is the same as
∥𝐵∗ 𝑦∥ 𝑋∗ ≥ 𝛾∥𝑦∥𝑌 for all 𝑦 ∈ 𝑌 . (9.7)
By the Closed Range Theorem, (9.7) implies that 𝐵 is surjective, but we do not
know that 𝐵 is injective. In other words, 𝐵𝑥 = ℓ is solvable for any ℓ ∈ 𝑌 ∗ , but its
solution need not be unique in general. Hence, in this case, 𝐵𝑥 = ℓ represents an
underdetermined system.
As in the previous case, the inf-sup condition (9.7) immediately implies, by
standard mixed method theory, that (9.3) is uniquely solvable. Eliminating 𝜆, we
find that the unique 𝑥 it solves for is given by
−1
𝑥 = −𝑅 𝑋−1 𝐵∗ 𝐵𝑅 𝑋−1 𝐵∗ ℓ.
This solution is orthogonal to ker 𝐵 and has the least norm among all possible
solutions of 𝐵𝑥 = ℓ.

9.2. Relationship to the DPG method and a dual DPG* method

Define 𝑎 : (𝑋 × 𝑌 ) × (𝑋 × 𝑌 ) → C by
𝑎((𝑥, 𝜁), (𝑧, 𝑦)) = (𝜁, 𝑦)𝑌 + 𝑏(𝑥, 𝑦) + 𝑏(𝑧, 𝜁)
for all 𝑥, 𝑧 ∈ 𝑋 and 𝜁, 𝑦 ∈ 𝑌 and suppose 𝐹 ∈ (𝑋 × 𝑌 )∗ is given. Equation (9.2) can
then be written as
𝑎((𝑥, 𝜁), (𝑧, 𝑦)) = 𝐹(𝑧, 𝑦) for all 𝑧 ∈ 𝑋, 𝑦 ∈ 𝑌 , (9.8)
with 𝐹(𝑧, 𝑦) = ℓ(𝑦). Using subspaces 𝑋ℎ ⊂ 𝑋 and 𝑌 𝑟 ⊂ 𝑌 satisfying the discrete
version of the inf-sup condition (1.2a),
|𝑏(𝑧, 𝑦)|
1≲ inf sup , (9.9)
0≠𝑧 ∈𝑋ℎ 0≠𝑦 ∈𝑌 𝑟 ∥𝑧∥ 𝑋 ∥𝑦∥𝑌

consider the discrete problem to find 𝑥 ℎ ∈ 𝑋ℎ and 𝜁 𝑟 ∈ 𝑌 𝑟 satisfying

𝑎((𝑥 ℎ , 𝜁 𝑟 ), (𝑧, 𝑦)) = 𝐹(𝑧, 𝑦) for all 𝑧 ∈ 𝑋ℎ , 𝑦 ∈ 𝑌 𝑟 . (9.10)
From the standard theory of mixed methods in Brezzi and Fortin (1991), we obtain
quasioptimality of the method (9.10). To summarize, suppose the exact inf-sup
condition (1.2a) and the discrete inf-sup condition (9.9) hold. Then (9.8) and (9.10)
are uniquely solvable for any 𝐹 ∈ (𝑋 × 𝑌 )∗ , and their solutions satisfy

∥𝑥 − 𝑥 ℎ ∥ 𝑋 + ∥𝜁 − 𝜁 𝑟 ∥𝑌 ≲ inf 𝑟 𝑟 ∥𝑥 − 𝑧 ℎ ∥ 𝑋 + ∥𝜁 − 𝑦 𝑟 ∥𝑌 . (9.11)
𝑧ℎ ∈𝑋ℎ ,𝑦 ∈𝑌

This has implications for both the DPG method and a dual DPG* method defined
shortly. First, the DPG method, in the form of the mixed system in Theorem 6.3(b),
is a discretization of (9.2), or its equivalent form (9.8), with
𝐹(𝑧, 𝑦) = ℓ(𝑦). (9.12)
Hence, once the inf-sup conditions (1.2a) and (9.9) are verified, the DPG method in
the mixed form (6.7) can be used to regularize and solve overdetermined systems,
even when it is not possible to verify the uniqueness assumption (1.2b) or the
compatibility condition (9.5). Moreover, if 𝐵 is a continuous bijection (so that the
system is no longer overdetermined) and 𝐹 is as in (9.12), then it is easy to see that
𝜁 = 0, and that 𝜁 𝑟 = 𝜀𝑟 together with 𝑥 ℎ solves the DPG method (6.7). Then (9.11)
reduces to
∥𝑥 − 𝑥 ℎ ∥ 𝑋 + ∥𝜀𝑟 ∥𝑌 ≲ inf 𝑟 𝑟 ∥𝑥 − 𝑧 ℎ ∥ 𝑋 , (9.13)
𝑧ℎ ∈𝑋ℎ ,𝑦 ∈𝑌

an error estimate we can also conclude from the theory in prior sections.
Next, consider dual formulations of (9.8). Since the operator generated by the
form 𝑎(·, ·) is self-adjoint, ‘dual problems’ of (9.8) take the same form as (9.8). By
a DPG* method we mean the method (9.10) for the case
𝐹(𝑧, 𝑦) = 𝑔(𝑧),
where 𝑔 ∈ 𝑋 ∗ . To distinguish from the previous case, let us now rename 𝜁 𝑟 as
𝜉 𝑟 and 𝑥 ℎ as 𝜆 ℎ . We can then rewrite (9.10) to express the DPG* method as the
method that finds 𝜉 𝑟 ∈ 𝑌 𝑟 and 𝜆 ℎ ∈ 𝑋ℎ satisfying
(𝜉 𝑟 , 𝑦)𝑌 + 𝑏(𝜆 ℎ , 𝑦) = 0 for all 𝑦 ∈ 𝑌 𝑟 , (9.14a)
𝑏(𝑧, 𝜉 𝑟 ) = 𝑔(𝑧) for all 𝑧 ∈ 𝑋ℎ . (9.14b)
Now it is evident that this is a discretization of (9.3) with the roles of 𝑋 and 𝑌
reversed, 𝐵∗ in place of 𝐵, 𝜉 in place of 𝑥, and 𝑔 in place of ℓ, that is,
(𝜉, 𝑦)𝑌 + 𝑏(𝜆, 𝑦) = 0 for all 𝑦 ∈ 𝑌 , (9.15a)
𝑏(𝑧, 𝜉) = 𝑔(𝑧) for all 𝑧 ∈ 𝑋, (9.15b)
thus revealing the connection with underdetermined systems. By verifying the
exact same inf-sup conditions as for the DPG method, namely (1.2a) and (9.9),

the estimate (9.11) then gives that the DPG* method is uniquely solvable and the
solution satisfies

∥𝜆 − 𝜆 ℎ ∥ 𝑋 + ∥𝜉 − 𝜉 𝑟 ∥𝑌 ≲ inf 𝑟 𝑟 ∥𝜆 − 𝑧 ℎ ∥ 𝑋 + ∥𝜉 − 𝑦 𝑟 ∥𝑌 . (9.16)
𝑧ℎ ∈𝑋ℎ ,𝑦 ∈𝑌

An important difference between the DPG* estimate (9.16) and the DPG estimate
(9.13) is that convergence in (9.16) depends on the regularity of an extraneous
Lagrange multiplier 𝜆.

9.3. Error in goal functionals

A typical application of duality is in characterizing the error in a goal functional or
in goal-oriented adaptivity. Let 𝐺 be a continuous linear functional on 𝑋 such that
𝐺(𝑥) represents a goal of interest that depends on the solution 𝑥. After computing
𝑥 ℎ by the DPG method, we obtain an approximate goal 𝐺(𝑥 ℎ ). We are interested in
bounding the error 𝐺(𝑥) − 𝐺(𝑥 ℎ ). The dual formulation of DPG* method is useful
in this context.
Theorem 9.1 (Error in goal functional). Let 𝑥 ∈ 𝑋 solve (1.1) and 𝑥 ℎ ∈ 𝑋ℎ
solve the DPG discretization (5.4). Let 𝜉 ∈ 𝑌 and 𝜉 𝑟 ∈ 𝑌 𝑟 be as in the DPG*
formulations (9.15) and (9.14) with 𝑔(𝑧) = 𝐺(𝑧). Then the error in the goal
functional is given by
𝐺(𝑥) − 𝐺(𝑥 ℎ ) = 𝑏(𝑥 − 𝑥 ℎ , 𝜉 − 𝜉 𝑟 ). (9.17)
Proof. First note that
𝑏(𝑥 − 𝑥 ℎ , 𝜉 𝑟 ) = −(𝜀𝑟 , 𝜉 𝑟 )𝑌 by subtracting (6.7a) from (1.1)
= −𝑏(𝜆 ℎ , 𝜀𝑟 ) by (9.14a)
=0 by (6.7b). (9.18)
Hence
𝐺(𝑥 − 𝑥 ℎ ) = 𝑏(𝑥 − 𝑥 ℎ , 𝜉) by (9.15b) (9.19)
= 𝑏(𝑥 − 𝑥 ℎ , 𝜉 − 𝜉 𝑟 ),
and the result follows.
An identity analogous to (9.17) holds for the error in the goal when using the
standard Galerkin method, where we have the additional freedom to choose one
of 𝑥 ℎ or 𝜉 𝑟 arbitrarily from the corresponding finite element space. An analogous
freedom exists in the DPG case as well. Since subtracting (5.4) from (1.1) gives
𝑏(𝑥 − 𝑥 ℎ , 𝑦 ℎ ) = 0 for all 𝑦 ℎ ∈ 𝑇 𝑟 (𝑋ℎ ), we may combine it with (9.19) to obtain
𝐺(𝑥) − 𝐺(𝑥 ℎ ) = 𝑏(𝑥 − 𝑥 ℎ , 𝜉 − 𝑇 𝑟 𝑤 ℎ ), (9.20)
an identity that holds for any 𝑤 ℎ in 𝑋ℎ . Nonetheless, while obtaining convergence
rates from either (9.20) or (9.17), the limiting factor is usually the regularity of the
dual solution.

9.4. Aubin–Nitsche argument for DPG methods

Aubin–Nitsche duality arguments are typically used in finite element methods to
prove higher rates of convergence in weaker norms. We present such an argument,
adopting the general hybrid setting of (4.12) and Theorem 4.3, where 𝑋 = 𝑋0 × 𝑋ˆ
and the solution takes the form (𝑥, 𝑥) ˆ with 𝑥 ∈ 𝑋0 and 𝑥ˆ ∈ 𝑋. ˆ We return to our
standard setting where one of (1.1), (1.2) or (1.3) holds, that is, 𝐵 is a bijection
(so we are no longer considering overdetermined or underdetermined systems).
Limiting ourselves to showing how a duality argument can potentially yield higher
rates of convergence for the solution component 𝑥 in 𝑋0 .
Recall the equivalent mixed form of the DPG method given in (6.7). We rewrite
it using the composite sesquilinear form 𝑎(·, ·), which in the hybrid case takes the
form
ˆ 𝜀), (𝑧, 𝑧ˆ, 𝑦)) = (𝜀, 𝑦)𝑌 + 𝑏((𝑥, 𝑥),
𝑎((𝑥, 𝑥, ˆ 𝑦) + 𝑏((𝑧, 𝑧ˆ), 𝜀)
for all 𝑥, 𝑧 ∈ 𝑋, 𝑥,
ˆ 𝑧ˆ ∈ 𝑋ˆ and 𝜀, 𝑦 ∈ 𝑌 . The system (6.7) can be reformulated as the
problem of finding 𝑥 ℎ ∈ 𝑋ℎ,0 ⊂ 𝑋0 , 𝑥ˆ ℎ ∈ 𝑋ˆ ℎ ⊂ 𝑋ˆ and 𝜀𝑟 ∈ 𝑌 𝑟 satisfying
𝑎((𝑥 ℎ , 𝑥ˆ ℎ , 𝜀𝑟 ), (𝑧 ℎ , 𝑧ˆ ℎ , 𝑦 𝑟 )) = ℓ(𝑦 𝑟 ) (9.21)
for all 𝑧 ℎ ∈ 𝑋ℎ , 𝑧ˆ ℎ ∈ 𝑋ˆ ℎ,0 , 𝑦 𝑟 ∈ 𝑌 𝑟 . The undiscretized version of this equation is
to find 𝑥 ∈ 𝑋 such that
ˆ 0), (𝑧, 𝑧ˆ, 𝑦)) = ℓ(𝑦)
𝑎((𝑥, 𝑥, (9.22)
for all 𝑧 ∈ 𝑋0 , 𝑧ˆ ∈ 𝑋,
ˆ 𝑦 ∈ 𝑌 . Recall that 𝜁 in (9.8) equals zero when 𝐵 is a bijection.
Obviously, (9.22) is equivalent to (4.13). Since the operator generated by the form
𝑎(·, ·) is self-adjoint, dual problems takes the same form, with the roles of test and
trial functions reversed.
To detail a specific dual problem of interest, suppose 𝐿 and 𝑍 are Hilbert spaces
such that the embeddings
𝑍 ⊆ 𝑋 × 𝑌 and 𝑋0 ⊆ 𝐿 are continuous. (9.23a)
For any 𝑔 ∈ 𝐿, we consider the ‘dual problem’ for
𝜉𝑔 = (𝑥 𝑔 , 𝑥ˆ𝑔 , 𝜀 𝑔 ) ∈ 𝑋0 × 𝑋ˆ × 𝑌
satisfying
𝑎((𝑧, 𝑧ˆ, 𝑦), 𝜉𝑔 ) = (𝑧, 𝑔) 𝐿 for all 𝑧 ∈ 𝑋0 , 𝑧ˆ ∈ 𝑋,
ˆ 𝑦 ∈ 𝑌. (9.23b)
The right-hand side is a continuous linear functional on 𝑋 by (9.23a). Suppose
there is a 𝑐(ℎ) > 0 such that for any 𝑔 ∈ 𝐿, there is an 𝑥 𝑔 ∈ 𝐿 and 𝜀 𝑔 ∈ 𝑌 satisfying
(9.23b) and
inf 𝑟 ∥𝜉𝑔 − 𝜍 ℎ ∥ 𝑋×𝑌 ≤ 𝑐(ℎ)∥𝑔∥ 𝐿 . (9.23c)
𝜍ℎ ∈𝑋ℎ ×𝑌

In examples, one would want to leverage regularity of the dual solution, if available,
to verify (9.23c) and obtain some 𝑐(ℎ) that goes to zero as ℎ decreases.

Theorem 9.2 (Duality argument for DPG formulations). Assume the setting
of (9.23) and (4.12). Then
ˆ 0) − (𝑥 ℎ , 𝑥ˆ ℎ , 𝜀𝑟 )∥ 𝑋0 × 𝑋×𝑌
∥𝑥 − 𝑥 ℎ ∥ 𝐿 ≤ 𝑐(ℎ) ∥𝑎∥ ∥(𝑥, 𝑥, ˆ . (9.24)

Proof. Subtracting (9.22) from (9.21),

𝑎((𝑥 − 𝑥 ℎ , 𝑥ˆ − 𝑥ˆ ℎ , 0 − 𝜀𝑟 ), (𝑧 ℎ , 𝑧ˆ ℎ , 𝑦 𝑟 )) = 0 (9.25)
for all 𝑧 ℎ ∈ 𝑋ℎ , 𝑧ˆ ℎ ∈ 𝑋ˆ ℎ,0 , 𝑦 𝑟 ∈ 𝑌 𝑟 . Next, we use (9.23b) with 𝑔 = 𝑥 −𝑥 ℎ ∈ 𝑋0 ⊆ 𝐿,
𝑧 = 𝑥 − 𝑥 ℎ , 𝑧ˆ = 𝑥ˆ − 𝑥ˆ ℎ and 𝑦 = −𝜀𝑟 , to get
∥𝑥 − 𝑥 ℎ ∥ 2𝐿 = 𝑎((𝑥 − 𝑥 ℎ , 𝑥ˆ − 𝑥ˆ ℎ , −𝜀𝑟 ), 𝜉𝑔 )
= 𝑎((𝑥 − 𝑥 ℎ , 𝑥ˆ − 𝑥ˆ ℎ , −𝜀𝑟 ), 𝜉𝑔 − 𝜍 ℎ ) by (9.25)
≤ ∥𝑎∥ ∥(𝑥 − 𝑥 ℎ , 𝑥ˆ − 𝑥ˆ ℎ , 𝜀𝑟 )∥ 𝑋0 × 𝑋×𝑌
ˆ ∥𝜉𝑔 − 𝜍 ℎ ∥ 𝑋0 × 𝑋×𝑌
ˆ

for any 𝜍 ℎ ∈ 𝑋0,ℎ × 𝑋ˆ ℎ × 𝑌 𝑟 . Hence the result follows from (9.23c).

Example 9.3 (The dual of a primal DPG formulation on a convex domain).

The primal DPG method for the Laplace equation of Example 5.5 offers a simple
example of how one can determine the regularity of the dual solutions, assuming
that the domain 𝛺 is convex. Recall that there we have set 𝑋0 = 𝐻˚ 1 (𝛺), 𝑋ˆ =
𝐻 −1/2 (𝜕𝛺 ℎ ) and 𝑌 = 𝐻 1 (𝛺 ℎ ). Additionally, set
𝐿 = 𝐿 2 (𝛺), 𝑍 = (𝐻 2 (𝛺) ∩ 𝑋0 ) × 𝑋ˆ × (𝐻 2 (𝛺) ∩ 𝑌 ).

Then (9.23a) is obvious. The dual problem (9.23b) for 𝜉𝑔 = (𝑥 𝑔 , 𝑥ˆ𝑔 , 𝜀 𝑔 ) ∈ 𝐻˚ 1 (𝛺) ×
𝐻 −1/2 (𝜕𝛺 ℎ ) × 𝐻 1 (𝛺 ℎ ), after complex conjugations as needed, reads as follows:
(𝜀 𝑔 , 𝑦)𝑌 + (grad 𝑥 𝑔 , grad 𝑦)ℎ − ⟨𝑥ˆ𝑔 , 𝑦⟩ℎ = 0, (9.26a)
(grad 𝜀 𝑔 , grad 𝑧)ℎ = (𝑔, 𝑧)𝛺 , (9.26b)
⟨𝑧ˆ, 𝜀 𝑔 ⟩ℎ = 0 (9.26c)

for all 𝑦 ∈ 𝐻 1 (𝛺 ℎ ), 𝑤 ∈ 𝐻˚ 1 (𝛺) and 𝑧ˆ ∈ 𝐻 −1/2 (𝜕𝛺 ℎ ).

We need to understand the regularity of solutions of (9.26). First, note that the
𝜀 𝑔 component in 𝐻 1 (𝛺 ℎ ) is actually in 𝐻˚ 1 (𝛺), as seen from (9.26c) after applying
Theorem 4.6(a). Together with (9.26b), we conclude that
−Δ𝜀 𝑔 = 𝑔 on 𝛺, (9.27a)
𝜀𝑔 = 0 on 𝜕𝛺. (9.27b)

Next, observe that equation (9.26a) with 𝑦 ∈ 𝐻˚ 1 (𝛺) yields

(grad 𝑥 𝑔 , grad 𝑦) = −(𝜀 𝑔 , 𝑦)ℎ − (grad 𝜀 𝑔 , grad 𝑦)ℎ = −(𝜀 𝑔 , 𝑦)ℎ + (Δ𝜀 𝑔 , 𝑦)ℎ ,
which implies Δ𝑥 𝑔 = 𝜀 𝑔 + 𝑔. Finally, using the equations for 𝑥 𝑔 and 𝜀 𝑔 in (9.26a)

and integrating by parts, we find ⟨𝑥ˆ𝑔 , 𝑦⟩ℎ = ⟨𝑛 · grad(𝜀 𝑔 + 𝑥 𝑔 ), 𝑦⟩ℎ . Hence

Δ𝑥 𝑔 = 𝜀 𝑔 + 𝑔 on 𝛺, (9.27c)
𝑥𝑔 = 0 on 𝜕𝛺, (9.27d)
𝑥ˆ𝑔 = 𝑛 · grad(𝜀 𝑔 + 𝑥 𝑔 ) on 𝜕𝐾, for all 𝐾 ∈ 𝛺 ℎ . (9.27e)
At this point we are able to use the well-known full regularity of the Dirichlet
problem on a convex domain (see e.g. Grisvard 1985), to conclude that
∥𝜀 𝑔 ∥ 𝐻 2 (𝛺) ≲ ∥𝑔∥ 𝛺 ,
∥𝑥 𝑔 ∥ 𝐻 2 (𝛺) ≲ ∥𝜀 𝑔 ∥ 𝛺 + ∥𝑔∥ 𝛺 ≲ ∥𝑔∥ 𝛺 ,
which in turn also implies that the interface variable satisfies
∥ 𝑥ˆ𝑔 ∥ 𝐻 −1/2 (𝜕𝛺ℎ ) ≤ ∥ grad(𝜀 𝑔 + 𝑥 𝑔 )∥ 𝐻(div,𝛺)
= ∥ grad(𝜀 𝑔 + 𝑥 𝑔 )∥ 𝛺 + ∥Δ(𝜀 𝑔 + 𝑥 𝑔 )∥ 𝛺 ≲ ∥𝑔∥ 𝛺 .
Hence we have shown the regularity estimate
∥𝜉𝑔 ∥ 𝑍 = ∥(𝑥 𝑔 , 𝑥ˆ𝑔 , 𝜀 𝑔 )∥ 𝑍 ≲ ∥𝑔∥ 𝛺 . (9.28)
To complete the verification of (9.23c), we now only need to bound approxima-
tion errors. By an application of the Bramble–Hilbert lemma as in Example 5.5, it is
easy to show that there is an interpolant 𝜉𝑔,ℎ ≡ (𝑥 𝑔,ℎ , 𝑥ˆ𝑔,ℎ , 𝜀 𝑔,ℎ ) of 𝜉𝑔 = (𝑥 𝑔 , 𝑥ˆ𝑔 , 𝜀 𝑔 )
such that

∥𝜉𝑔 − 𝜉𝑔,ℎ ∥ 𝑋0 × 𝑋×𝑌
ˆ ≲ ℎ ∥𝜀 𝑔 ∥ 𝐻 2 (𝛺) + ∥𝑥 𝑔 ∥ 𝐻 2 (𝛺) ≲ ℎ∥𝑔∥ 𝛺 ,
where the last inequality followed from (9.28). This verifies (9.23c) with 𝑐(ℎ) = ℎ.
Applying Theorem 9.2, we obtain

∥𝑢 − 𝑢 ℎ ∥ 𝛺 ≤ 𝐶ℎ ∥𝑢 − 𝑢 ℎ ∥ 𝐻 1 (𝛺) + ∥ 𝑞ˆ 𝑛 − 𝑞ˆ 𝑛,ℎ ∥ 𝐻 −1/2 (𝜕𝛺ℎ ) ,
which shows that on a convex domain we expect to obtain an 𝐿 2 -convergence rate
of one higher order for 𝑢 ℎ than the 𝐻 1 -rate we proved earlier in (5.25).
Bibliographical notes. The DPG* method was introduced in Demkowicz, Gopala-
krishnan and Keith (2020), motivated by the LL∗ method (or the ‘FOSLL∗ method’)
of Cai, Manteuffel, McCormick and Ruge (2001). Numerical experiments in Dem-
kowicz et al. (2020, § 5.3) include a case where the 𝜆 in (9.16) is in 𝐻 3 (𝛺) while
𝑥 is much more regular. It confirms that both the DPG* and the LL∗ methods
have convergence rates that are limited by the regularity of 𝜆. The argument of
Theorem 9.1 can be found in Keith (2018). Such arguments are leveraged for
goal-oriented adaptivity in Keith et al. (2019). Theorem 9.2 and its application
to the primal DPG formulation in Example 9.3 are from Bouma et al. (2014). A
further example applying the duality argument to an ultraweak DPG formulation
can be found in Führer (2018).

10. Pointers to DPG techniques for nonlinear problems

Exploitation of DPG ideas to nonlinear problems is an active area of current re-
search. In this section we discuss a DPG extension to nonlinear problems by the
steepest descent method. First, however, we quickly give pointers to existing liter-
ature containing various other ways of utilizing DPG ideas in nonlinear problems.

10.1. Prior literature

A natural avenue for dealing with nonlinearities is the use of Newton–Raphson
iterations that linearizes the nonlinear problem and applies the prior DPG ideas
to the linearized problem. Many have adopted this avenue (Chan, Demkowicz
and Moser 2014a, Roberts, Demkowicz and Moser 2015), and this approach is
related to the classical Gauss–Newton method mentioned below in Section 10.3. A
PDE-constrained residual minimization problem for solving nonlinear systems was
formulated in Bui-Thanh and Ghattas (2014). They combined it with a trust-region
inexact Newton conjugate gradient iteration to solve two-dimensional Burgers and
Euler equations.
A nonlinear mixed problem cast in the residual minimization DPG framework
can be found in Carstensen, Bringmann, Hellwig and Wriggers (2018) for a model
nonlinear diffusion problem. They studied it in the context of the primal formulation
and lowest-order approximations; their work includes a priori and a posteriori
error estimates as well as an equivalent least-squares formulation, and is illustrated
with two-dimensional examples involving adaptivity. A different approach for the
same model problem was taken by Cantin and Heuer (2018), who, by introducing
additional unknowns, reformulated the nonlinear problem as a linear one with a
nonlinear algebraic constraint. The DPG technology is then used only for the linear
problem, and the nonlinear constraint is enforced by penalization. The resulting
system is an extension of the mixed form (6.7) of the DPG method to a saddle-
point formulation with a strongly monotone diagonal block that is wellposed under
appropriate conditions. Nonlinearities in the same block also arise in the work of
Muga and van der Zee (2020) on residual minimization without a Hilbert structure
through Banach duality maps.
Other lines of DPG research involving nonlinearities include problems charac-
terized by variational inequalities such as contact problems in elasticity. Führer,
Heuer and Stephan (2018a) have developed a DPG theory for (scalar) Signorini-
type problems, where optimal test functions are used for discretizing the partial
differential operator of the problem and duality terms are added to incorporate the
nonlinear boundary conditions. This yields a variational inequality of the first kind.
By using an ultraweak formulation they have direct access to normal derivatives
through one of the trace variables (unlike standard weak formulations). They also
derived reliable error estimators consisting of an error representation as in previ-
ous sections, plus a duality term measuring the violation of the complementarity
condition.

10.2. Steepest descent iteration

We now discuss how to extend the residual minimization methodology to general
nonlinear problems in the framework of the steepest descent method, borrowed
from Li (2024). The same technique was applied much earlier by Bristeau et al.
(1979, 1985) to solve the challenging transonic flow problem. Although nonlinear
problems deserve to be set in Banach spaces, for simplicity we limit ourselves to
the Hilbert space setting. Unlike the remainder of this paper, here we assume that
all spaces are over R, and that 𝐵 : 𝑋 → 𝑌 ∗ is a nonlinear operator generated by a
form 𝑏(𝑥; 𝑦) which is nonlinear in 𝑥 and linear in 𝑦 via
𝐵(𝑥)(𝑦) = 𝑏(𝑥; 𝑦) (10.1)
for any 𝑥 ∈ 𝑋 and 𝑦 ∈ 𝑌 .
We are interested in solving the nonlinear analogue of (1.1) using the 𝑏 in (10.1),
which we now recast as the problem of approximating a minimizer of
1
min ∥ℓ − 𝐵(𝑥)∥𝑌2 ∗ , (10.2)
𝑥 ∈𝑋 2

given some ℓ ∈ 𝑌 ∗ . Define nonlinear maps 𝐶 : 𝑋 → 𝑌 and 𝐽 : 𝑋 → R by

𝐶(𝑥) = 𝑅𝑌−1 (ℓ − 𝐵(𝑥)) and 𝐽(𝑥) = 12 ∥𝐶(𝑥)∥ 2 . Then finding a minimizer in (10.2) is
the same as finding
𝑥 = arg min 𝐽(𝑤). (10.3)
𝑤 ∈𝑋

Here we have used the isometry of the Riesz map 𝑅𝑌 in (3.1).

To compute (Gateaux) derivatives of these nonlinear maps, we use the following
notation. For any normed linear spaces 𝑈, 𝑉 and 𝐹 : 𝑈 → 𝑉, we write
𝐹(𝑢 + 𝑡𝑧) − 𝐹(𝑢)
𝑑𝐹𝑢 (𝑧) ≡ 𝑑𝐹𝑢 𝑧 = lim ,
𝑡→0 𝑡
for any 𝑢 and 𝑧 in 𝑈, if the limit exists in the topology of 𝑉 and results in a
continuous linear operator 𝑑𝐹𝑢 : 𝑈 → 𝑉. We proceed assuming that for the
previously introduced maps 𝐽, 𝐶 and 𝐵, the derivatives 𝑑𝐽 𝑥 : 𝑋 → R, 𝑑𝐵 𝑥 : 𝑋 → 𝑌 ∗
and 𝑑𝐶 𝑥 : 𝑋 → 𝑌 exist at any 𝑥 ∈ 𝑋. Note that by definition 𝑑𝐽 𝑥 is in 𝑋 ∗ (which
consists of continuous linear, not antilinear, functionals since 𝑋 is now over R).
The steepest descent iteration to approximate (10.3) uses the gradient of 𝐽, which
is an endomorphism ∇𝐽 : 𝑋 → 𝑋 defined by
(∇𝐽)(𝑥) = 𝑅 𝑋−1 𝑑𝐽 𝑥 . (10.4)
Given an initial iterate 𝑥 0 ∈ 𝑋, the iteration produces 𝑥 𝑛 by
𝑥 𝑛+1 = 𝑥 𝑛 − 𝛼 (∇𝐽)(𝑥 𝑛 ), 𝑛 = 0, 1, 2, . . . , (10.5)
where 0 < 𝛼 ≤ 1 is the step size. Let us compute ∇𝐽. Recall our notation for
the adjoint of a linear operator 𝑀 : 𝑋 → 𝑌 ∗ , namely 𝑀 ∗ : 𝑌 → 𝑋 ∗ , obtained after
identifying the bidual of a Hilbert space with itself, as already mentioned in (3.9),

namely
(𝑀 ∗ 𝑦)(𝑧) = (𝑀 𝑧)(𝑦) (10.6)
for any 𝑦 ∈ 𝑌 and 𝑥 ∈ 𝑋 (with no conjugation since the spaces are now over R).
Proposition 10.1. In the above setting, for any 𝑥 ∈ 𝑋,
(∇𝐽)(𝑥) = −𝑅 𝑋−1 (𝑑𝐵 𝑥 )∗ 𝑅𝑌−1 (ℓ − 𝐵(𝑥)).
Proof. For any 𝑥, 𝑧 ∈ 𝑋, since 𝑑𝐶 𝑥 𝑧 = (𝑑𝑅𝑌−1 (ℓ − 𝐵(𝑥)))(𝑧) = −𝑅𝑌−1 𝑑𝐵 𝑥 𝑧 and
𝐽(𝑥) = 12 (𝐶(𝑥), 𝐶(𝑥))𝑌 ,
𝑑𝐽 𝑥 𝑧 = (𝑑𝐶 𝑥 𝑧, 𝐶(𝑥))𝑌 = (−𝑅𝑌−1 𝑑𝐵 𝑥 𝑧, 𝐶(𝑥))𝑌
= −(𝑑𝐵 𝑥 𝑧)(𝐶(𝑥)) = −((𝑑𝐵 𝑥 )∗𝐶(𝑥))(𝑧)
by (10.6). Now the result follows from (10.4).
By Proposition 10.1, the steepest descent iteration becomes
𝑥 𝑛+1 = 𝑥 𝑛 + 𝛼𝑅 𝑋−1 (𝑑𝐵 𝑥𝑛 )∗ 𝑅𝑌−1 (ℓ − 𝐵(𝑥 𝑛 )), 𝑛 = 0, 1, . . . . (10.7)
It is applicable for DPG formulations when the inverse of the Gram matrices of
both the 𝑋 and the 𝑌 inner products can be efficiently applied; see the discussion
in Section 10.4.
Example 10.2 (Specialization to the linear case). Suppose the operator 𝐵 in
(10.1) is a linear continuous bijection. Then 𝑑𝐵 𝑥 = 𝐵 is independent of 𝑥. By
Proposition 10.1, the steepest descent iteration (10.5) then becomes
𝑥 𝑛+1 = 𝑥 𝑛 + 𝛼𝑅 𝑋−1 𝐵∗ 𝑅𝑌−1 (ℓ − 𝐵𝑥 𝑛 ).
If 𝑥 is the exact solution of (1.1), then this can be rewritten as
𝑥 − 𝑥 𝑛+1 = 𝐼 − 𝛼𝑅 𝑋−1 𝐵∗ 𝑅𝑌−1 𝐵 (𝑥 − 𝑥 𝑛 ).

Consequently, if the error-reducing operator satisfies

∥𝐼 − 𝛼𝑅 𝑋−1 𝐵∗ 𝑅𝑌−1 𝐵∥ ≤ 𝑞 < 1, (10.8)
where the norm is the induced operator norm in 𝑋, we have a contraction which
guarantees the convergence of the iterations. Note that the operator 𝑅 𝑋−1 𝐵∗ 𝑅𝑌−1 𝐵 is
self-adjoint. Indeed, we have for any 𝑧, 𝑤 ∈ 𝑋,
𝑅 𝑋−1 𝐵∗ 𝑅𝑌−1 𝐵𝑧, 𝑤 𝑋 = 𝐵∗ 𝑅𝑌−1 𝐵𝑧, 𝑤 𝑋 = 𝐵𝑤, 𝑅𝑌−1 𝐵𝑧 𝑌 = 𝑅𝑌−1 𝐵𝑤, 𝑅𝑌−1 𝐵𝑧 𝑌 .

Since the last expression is symmetric in 𝑧 and 𝑤, the self-adjointness follows.

Additionally, we find that the operator norm in (10.8) is the same as
𝐼 − 𝛼𝑅 𝑋−1 𝐵∗ 𝑅𝑌−1 𝐵 = 𝑤 − 𝛼𝑅 𝑋−1 𝐵∗ 𝑅𝑌−1 𝐵𝑤, 𝑤 𝑋

sup
𝑤 ∈𝑋, ∥ 𝑤 ∥ 𝑋 =1

= sup ∥𝑤∥ 2𝑋 − 𝛼∥𝐵𝑤∥𝑌2 ∗ .

𝑤 ∈𝑋, ∥ 𝑤 ∥ 𝑋 =1

Using ∥𝑏∥ and the inf-sup constant 𝛾 introduced in Section 1, we know that for any
𝑤 ∈ 𝑋,
𝛾∥𝑤∥ 𝑋 ≤ ∥𝐵𝑤∥𝑌 ∗ ≤ ∥𝑏∥ ∥𝑤∥ 𝑋 ,
which implies
(𝛼𝛾 2 − 1)∥𝑤∥ 2 ≤ 𝛼∥𝐵𝑤∥𝑌2 ∗ − ∥𝑤∥ 2𝑋 ≤ (𝛼∥𝑏∥ 2 − 1)∥𝑤∥ 2𝑋 .
This shows that the sufficient condition (10.8) for convergence can be met if
−𝑞 ≤ 𝛼𝛾 2 − 1 and 𝛼∥𝑏∥ 2 − 1 ≤ 𝑞,
or equivalently
1−𝑞 1+𝑞
≤𝛼≤ .
𝛾 2 ∥𝑏∥ 2
The smallest possible contraction constant
∥𝑏∥ 2 − 𝛾 2 𝐶 2 − 1 ∥𝑏∥
𝑞= = , where 𝐶 = ,
∥𝑏∥ 2 + 𝛾 2 𝐶 2 + 1 𝛾
is achieved when the lower and upper bounds for 𝛼 coincide. To summarize,
selecting any 𝑞 such that (𝐶 2 − 1)/(𝐶 2 + 1) ≤ 𝑞 < 1 and setting any 𝛼 satisfying
(1 − 𝑞)/𝛾 2 ≤ 𝛼 ≤ (1 + 𝑞)/∥𝑏∥ 2 , the steepest descent iterations with such an 𝛼
converge, and the error-reducing operator is a contractive map with a contraction
constant 𝑞 or higher.
If 𝑋 and 𝑌 are endowed with optimal norms that make 𝑏(·, ·) into a generalized
duality pairing as in Definition 3.5, then 𝛾 = ∥𝑏∥ = 𝐶 = 1. Hence 𝑞 = 0 and the
steepest descent method with 𝛼 = 1 delivers the DPG solution in just one step,
independently of the initial iterate.

10.3. Relation with the Gauss–Newton method

It is useful to compare (10.7) with the Gauss–Newton iterations (see e.g. Nocedal
and Wright 2006), which are obtained by linearizing 𝐵(𝑥) = ℓ around the current
iterate. Namely, if 𝑥 𝑛 is a current iterate, then Δ𝑥 = 𝑥 𝑛+1 − 𝑥 𝑛 is obtained from the
approximation 𝐵(𝑥 𝑛 + Δ𝑥) ≈ 𝐵(𝑥 𝑛 ) + 𝑑𝐵 𝑥𝑛 Δ𝑥 by requiring
𝐵(𝑥 𝑛 ) + 𝑑𝐵 𝑥𝑛 Δ𝑥 = ℓ.
One can bring in DPG techniques to solve for the increment by minimizing the
residual, that is, by finding
1
Δ𝑥 = arg min ∥𝐵(𝑥 𝑛 ) + 𝑑𝐵 𝑥𝑛 𝑤 − ℓ∥𝑌 ∗ . (10.9)
𝑤 ∈𝑋 2

As we have shown previously (see e.g. (9.6)), an equivalent way to compute this
minimizer is by solving
(𝑑𝐵 𝑥𝑛 )∗ 𝑅𝑌−1 𝑑𝐵 𝑥𝑛 Δ𝑥 = (𝑑𝐵 𝑥𝑛 )∗ 𝑅𝑌−1 (ℓ − 𝐵(𝑥 𝑛 )). (10.10)

This is solvable when 𝑑𝐵 𝑥𝑛 satisfies the inf-sup condition. Comparing (10.10) with
the increment Δ𝑥f of the steepest descent step iteration (10.7) with unit step size
𝛼 = 1, which solves
f = (𝑑𝐵 𝑥𝑛 )∗ 𝑅 −1 (ℓ − 𝐵(𝑥 𝑛 )),
𝑅 𝑋 Δ𝑥 𝑌
we see that the two iterations coincide with each other provided
𝑅 𝑋 = (𝑑𝐵 𝑥𝑛 )∗ 𝑅𝑌−1 𝑑𝐵 𝑥𝑛 ,
that is, provided we use the step-dependent energy norm
|||Δ𝑢||| 𝑋 = ∥𝑑𝐵 𝑥𝑛 Δ𝑢∥𝑌 ∗
for the space 𝑋 (which, of course, is a norm when the linearized operator 𝑑𝐵 𝑥𝑛
satisfies the inf-sup condition).

10.4. Trade-offs
The steepest descent iteration (10.7) requires the application of the inverse of the
Gram matrices of both the 𝑋 and 𝑌 inner products due to the presence of 𝑅 𝑋−1 and
𝑅𝑌−1 there. In Section 4 we discussed at length the DPG localization techniques to
make 𝑅𝑌−1 easy. However, we also need 𝑅 𝑋−1 to implement (10.5). This is a drawback
of the descent approach. Nonetheless, 𝑅 𝑋 is a linear Hermitian positive definite
operator. Moreover, the component spaces of 𝑋 have norms that are either standard
norms or quotient trace norms, which may be treated as norms arising from Schur
complements of Gram matrices of standard norms. As shown in Barker, Dobrev,
Gopalakrishnan and Kolev (2018), such Schur complements and the corresponding
𝑅 𝑋 can be efficiently preconditioned using off-the-shelf preconditioners.
Consider the alternative of the Newton–Raphson iterations which, in the context
of the minimum residual methodology, translates into the Gauss–Newton method.
Here we require the linearization 𝑑𝐵 𝑥 to satisfy the inf-sup condition. This require-
ment is absent for the steepest descent methodology and is another reason to opt
for it. For example, in nonlinear elasticity, the linearized problem may be singular
(such as in buckling) which, in numerics, manifests as bad conditioning. In the
steepest descent approach, we need only invert well-conditioned Riesz operators.
In both methodologies we need 𝑑𝐵 𝑥 , but in steepest descent we use it only to com-
pute the load, whereas in the Gauss–Newton approach we also use it to compute
and invert the stiffness matrix. Steepest descent naturally provides the possibility
of incorporating additional constraints by solving the minimum residual problem
with additional constraints. Incorporating the constraints through a penalty method
results in a minimal modification of the algorithm and, in particular, allows the
use of a penalty term that is only once differentiable; see Bristeau et al. (1985).
This is a common situation for inequality constraints. Thus the steepest descent
methodology appears to be much more robust than Gauss–Newton.
On the negative side, the steepest descent iterations deliver only linear conver-
gence compared with the quadratic convergence of Newton methods. It may be

advantageous to combine the two methodologies into a single algorithm. For ex-
ample, one may start with the more robust steepest descent iterations and finish
with Gauss–Newton iterations once the increments become small enough.

11. Further pointers and conclusion

We conclude this review by giving brief pointers to topics we have not covered. To
keep this review manageable, we have omitted details of many DPG-based tech-
niques, including DPG-style residual minimization in non-Hilbert norms (Muga
and van der Zee 2020), polygonal elements (Vaziri Astaneh, Fuentes, Mora and
Demkowicz 2018), fractional norms (Bacuta, Demkowicz, Mora and Xenophontos
2021a), regularization of rough functionals (Millar, Muga, Rojas and Van der Zee
2022), dispersion analysis (Gopalakrishnan et al. 2014), eigensolvers using con-
tour integrals (Gopalakrishnan, Grubišić, Ovall and Parker 2020), DPG eigenvalue
error indicators (Bertrand, Boffi and Schneider 2023) and connections with non-
conforming methods developed in Carstensen et al. (2014b). Works on residual
minimization under constraints include those of Ellis, Demkowicz and Chan (2014)
and Ellis, Chan and Demkowicz (2016), who consider elementwise conservation,
and Li and Demkowicz (2024), with circulation constraints around holes in the
domain.
Discussion of coupling of DPG methods with other methods was also omitted:
a variational formulation applying the DPG methodology to coupling boundary
integral operators was developed in Heuer and Karkulik (2015), but it led to non-
local optimal test functions boundary element degrees of freedom. This difficulty
was later overcome by Führer, Heuer and Karkulik (2017), who provided a frame-
work to efficiently couple the DPG method to Galerkin boundary element method
(BEM) or other numerical methods. Specifics on coupling of DPG and standard
finite element methods can be found in Führer, Heuer, Karkulik and Rodrı́guez
(2018b), and an application to a singularly perturbed transmission problem can
be found in Führer and Heuer (2017). A DPG BEM for hypersingular boundary
integral operators in three dimensions can be found in Heuer and Karkulik (2017a).
The DPG method has been applied to many applications, including incompress-
ible flows (Roberts, Bui-Thanh and Demkowicz 2014, Roberts et al. 2015), com-
pressible flows (Chan et al. 2014a), the Cahn–Hilliard equation (Valseth, Romkes
and Kaul 2021) and shallow water equations (Valseth and Dawson 2022). Applic-
ations to elasticity that we have not had a chance to detail include the work on
the Kirchhoff–Love model (Führer, Heuer and Niemi 2019) for thin-structure de-
formation. They discuss conformity for bending moments in 𝐻(div div), the space
of symmetric 𝐿 2 -tensors 𝜏 with div div(𝜏) in 𝐿 2 , appropriate for problems with
non-convex corners. Their analysis and discretization is motivated by the DPG
approach for ultraweak formulations; specifically, a conforming discretization of
bending moments in Führer et al. (2019) was achievable by the restriction to traces,
possible by the ultraweak DPG setting. Other works applying DPG ideas to plates

and shells include those of Calo, Collier and Niemi (2014), Führer, Heuer and
Niemi (2022, 2023) and Führer, Heuer and Sayas (2020).
DPG ideas are playing an important role in development of parameter-robust
methods. The work of Broersen, Dahmen and Stevenson (2018) gave stability
estimates for the DPG method applied to the linear transport equation that are
uniform in the relative orientation of the local mesh and the advection direction.
For singularly perturbed problems, specifically for advection-dominated diffusion,
parameter-robust stability was confirmed in Demkowicz and Heuer (2013) and
Chan, Heuer, Bui-Thanh and Demkowicz (2014b). The case of reaction-dominated
diffusion was studied in Heuer and Karkulik (2017b).
One of the attractive features of the DPG method is that it only requires the solu-
tion of a symmetric positive definite system, even when the original boundary value
problem is non-self-adjoint. This can be leveraged in the design of iterative solvers
and high performance computing. In Barker et al. (2018), one can find specifics
on how to combine off-the-shelf algebraic preconditioners effectively to develop
highly scalable DPG solvers. A DPG solver for harmonic wave propagation, in-
tegrated within an adaptive procedure, through a two-grid-like preconditioner for
the conjugate gradient method, was developed in Petrides and Demkowicz (2017)
as well as in Badger, Henneking, Petrides and Demkowicz (2023); it exhibits
excellent practical efficiency. Connecting DPG with other similar saddle-point
least-squares systems, certain solvers are suggested in Bacuta, Hayes and Jacavage
(2021b). Parameter robustness in DPG solvers is still highly sought after in specific
applications and remains an active area of research.
The design of stable spacetime formulations by DPG techniques is another topic
we have not detailed in this review, except for the early work in Demkowicz et al.
(2017), which is applicable beyond spacetime problems. Since then, spacetime
DPG formulations for transient waves have been studied in Gopalakrishnan and
Sepúlveda (2019), Sepúlveda (2018) and Ernesti and Wieners (2019), and a space-
time DPG method for the heat equation was developed in Diening and Storn (2022).
The approach of Demkowicz et al. (2017) to prove wellposedness in graph spaces
(along the lines of the theory of Friedrichs systems) was found to be difficult for
various spacetime problems. A new approach has been proposed in the recent
work of Führer, González and Karkulik (2024) using Bochner spaces. This shows
promise for reducing the technicalities in proving convergence of DPG and residual
minimization methods for spacetime methods.
An exciting new frontier is the use of DPG ideas for variationally correct machine
learning approaches. The very recent work of Rojas et al. (2024) defines a quadratic
loss functional, motivated by DPG-type formulations, within a physics-informed
neural network to solve a boundary value problem (and earlier developments in
physics-informed neural networks can be found in Kharazmi, Zhang and Karniada-
kis 2021). The recent work of Bachmayr, Dahmen and Oster (2024) centres around
learning the parameter-to-solution map for systems of partial differential equa-
tions that depend on a potentially large number of parameters. These works show

emerging techniques based on DPG formulations with a variationally correct resid-

ual (measured in a dual norm like the ones we have seen in earlier sections) forming
the basis for loss functionals in machine learning. The tools we have developed
here can be used to establish that a loss function based on such dual residuals is
uniformly proportional to the squared solution error in a mathematically correct
norm. Such results show potential to augment machine learning predictions with
rigorous a posteriori accuracy control. Other recent works that combine machine
learning with DPG and residual minimization ideas include Brevis et al. (2024)
and Brevis, Muga and van der Zee (2022).
These references show the wide variety of topics that the DPG ideas have im-
pacted. The essential theoretical underpinnings of the DPG methodology, discussed
earlier, should be enough preparation to delve into the above-mentioned works for
further studies. We limited the scope of earlier sections by selecting topics for
discussion that have potential applicability to a large variety of boundary value
problems. Discussions of specific boundary value problems have been delineated
as brief examples throughout, but the cited original sources are recommended for
a complete picture of each case.

Acknowledgements
The ideas behind DPG methods were developed over the years, utilizing support
from the Air Force Office of Scientific Research (AFOSR) and the National Science
Foundation (NSF). Both authors are immensely grateful for this support. The
preparation of this review was supported in part by AFOSR grant FA9550-23-1-
0103 and NSF grant 2245077. This work also benefited from activities organized
under the auspices of NSF RTG grant 2136228 as well as a decadal series of
workshops on DPG and residual minimization methods held in Austin (USA,
2013), Delft (The Netherlands, 2015), Portland (USA, 2017), Berlin (Germany,
2019), Santiago (Chile, 2022) and Bilbao (Spain, 2024). The authors gratefully
acknowledge feedback from several participants of these workshops which shaped
this review.

References
A. H. Al-Mohy and N. J. Higham (2011), Computing the action of the matrix exponential,
with an application to exponential integrators, SIAM J. Sci. Comput. 33, 488–511.
D. N. Arnold, R. S. Falk and R. Winther (2006), Finite element exterior calculus, homolo-
gical techniques, and applications, Acta Numer. 15, 1–155.
I. Babuška (1971), Error-bounds for finite element method, Numer. Math. 16, 322–333.
I. Babuška, A. K. Aziz, G. Fix and R. B. Kellogg (1972), Survey lectures on the mathe-
matical foundations of the finite element method, in The Mathematical Foundations of
the Finite Element Method with Applications to Partial Differential Equations (A. K.
Aziz, ed.), Academic Press, pp. 1–359.
M. Bachmayr, W. Dahmen and M. Oster (2024), Variationally correct neural residual
regression for parametric PDEs: On the viability of controlled accuracy. Available at
arXiv:2405.20065.

C. Bacuta, L. Demkowicz, J. Mora and C. Xenophontos (2021a), Analysis of non-

conforming DPG methods on polyhedral meshes using fractional Sobolev norms, Com-
put. Math. Appl. 95, 215–241.
C. Bacuta, D. Hayes and J. Jacavage (2021b), Notes on a saddle point reformulation of
mixed variational problems, Comput. Math. Appl. 95, 4–18.
J. Badger, S. Henneking, S. Petrides and L. Demkowicz (2023), Scalable DPG multigrid
solver for Helmholtz problems: A study on convergence, Comput. Math. Appl. 148,
81–92.
P. E. Barbone and I. Harari (2001), Nearly 𝐻 1 -optimal finite element methods, Comput.
Methods Appl. Mech. Engrg 190, 5679–5690.
A. T. Barker, V. Dobrev, J. Gopalakrishnan and T. Kolev (2018), A scalable preconditioner
for a DPG method, SIAM J. Sci. Comput. 40, A1187–A1203.
J. W. Barrett and K. W. Morton (1984), Approximate symmetrization and Petrov–Galerkin
methods for diffusion-convection problems, Comput. Methods Appl. Mech. Engrg 45,
97–122.
J. Bergh and J. Löfström (1976), Interpolation Spaces: An Introduction, Springer.
F. Bertrand, D. Boffi and H. Schneider (2023), Discontinuous Petrov–Galerkin approxim-
ation of eigenvalue problems, Comput. Methods Appl. Math. 23, 1–17.
P. B. Bochev and M. D. Gunzburger (2009), Least-Squares Finite Element Methods, Vol.
166 of Applied Mathematical Sciences, Springer.
T. Bouma, J. Gopalakrishnan and A. Harb (2014), Convergence rates of the DPG method
with reduced test space degree, Comput. Math. Appl. 68, 1550–1561.
J. H. Bramble and J. E. Pasciak (2004), A new approximation technique for div-curl systems,
Math. Comp. 73, 1739–1762.
J. H. Bramble, R. D. Lazarov and J. E. Pasciak (1997), A least-squares approach based on
a discrete minus one inner product for first order systems, Math. Comp. 66, 935–955.
I. Brevis, I. Muga and K. G. van der Zee (2022), Neural control of discrete weak formula-
tions: Galerkin, least squares & minimal-residual methods with quasi-optimal weights,
Comput. Methods Appl. Mech. Engrg 402, art. 115716.
I. Brevis, I. Muga, D. Pardo, O. Rodriguez and K. G. van der Zee (2024), Learning
quantities of interest from parametric PDEs: An efficient neural-weighted minimal
residual approach, Comput. Math. Appl. 164, 139–149.
H. Brezis (2011), Functional Analysis, Sobolev Spaces and Partial Differential Equations,
Universitext, Springer.
F. Brezzi (1974), On the existence, uniqueness and approximation of saddle-point prob-
lems arising from Lagrangian multipliers, Rev. Française Automat. Informat. Recherche
Opérationnelle Sér. Rouge 8, 129–151.
F. Brezzi and M. Fortin (1991), Mixed and Hybrid Finite Element Methods, Vol. 15 of
Springer Series in Computational Mathematics, Springer.
M. O. Bristeau, O. Pironneau, R. Glowinski, J. Périaux, J. P. Perrier and G. Poirier (1979),
On the numerical solution of nonlinear problems in fluid dynamics by least squares and
finite element methods, I: Least square formulations and conjugate gradient solution of
the continuous problems, Comput. Methods Appl. Mech. Engrg 17-18, 619–657.
M. O. Bristeau, O. Pironneau, R. Glowinski, J. Périaux, J. P. Perrier and G. Poirier (1985),
On the numerical solution of nonlinear problems in fluid dynamics by least squares and
finite element methods, II: Application to transonic flow simulations, Comput. Methods
Appl. Mech. Engrg 51, 363–394.

D. Broersen, W. Dahmen and R. P. Stevenson (2018), On the stability of DPG formulations

of transport equations, Math. Comp. 87, 1051–1082.
A. Buffa, M. Costabel and D. Sheen (2002), On traces for H(curl, Ω) in Lipschitz domains,
J. Math. Anal. Appl. 276, 845–867.
T. Bui-Thanh and O. Ghattas (2014), A PDE-constrained optimization approach to the
discontinuous Petrov–Galerkin method with a trust region inexact Newton-CG solver,
Comput. Methods Appl. Mech. Engrg 278, 20–40.
Z. Cai, R. Lazarov, T. A. Manteuffel and S. F. McCormick (1994), First-order system
least squares for second-order partial differential equations I, SIAM J. Numer. Anal. 31,
1785–1799.
Z. Cai, T. A. Manteuffel, S. F. McCormick and J. Ruge (2001), First-order system LL∗
(FOSLL*): Scalar elliptic partial differential equations, SIAM J. Numer. Anal. 39,
1418–1445.
V. M. Calo, N. O. Collier and A. H. Niemi (2014), Analysis of the discontinuous Petrov–
Galerkin method with optimal test functions for the Reissner–Mindlin plate bending
model, Comput. Math. Appl. 66, 2570–2586.
P. Cantin and N. Heuer (2018), A DPG framework for strongly monotone operators, SIAM
J. Numer. Anal. 56, 2731–2750.
C. Carstensen, P. Bringmann, F. Hellwig and P. Wriggers (2018), Nonlinear discontinuous
Petrov–Galerkin methods, Numer. Math. 139, 529–561.
C. Carstensen, L. Demkowicz and J. Gopalakrishnan (2014a), A posteriori error control
for DPG methods, SIAM J. Numer. Anal. 52, 1335–1353.
C. Carstensen, L. Demkowicz and J. Gopalakrishnan (2016), Breaking spaces and forms for
the DPG method and applications including Maxwell equations, Comput. Math. Appl.
72, 494–522.
C. Carstensen, D. Gallistl, F. Hellwig and L. Weggler (2014b), Low-order dPG-FEM for
an elliptic PDE, Comput. Math. Appl. 68, 1503–1512.
A. Celia, T. F. Russell, H. Ismael and R. E. Ewing (1990), An Eulerian–Lagrangian
localized adjoint method for the advection–diffusion equation, Adv. Water Resources
13, 187–206.
J. Chan, L. Demkowicz and R. Moser (2014a), A DPG method for steady viscous com-
pressible flow, Comput. Fluids 98, 69–90.
J. Chan, N. Heuer, T. Bui-Thanh and L. Demkowicz (2014b), Robust DPG method for
convection-dominated diffusion problems II: Adjoint boundary conditions and mesh-
dependent test norms, Comput. Math. Appl. 67, 771–795.
B. Cockburn and J. Gopalakrishnan (2004), A characterization of hybridized mixed meth-
ods for the Dirichlet problem, SIAM J. Numer. Anal. 42, 283–301.
A. Cohen, W. Dahmen and G. Welper (2012), Adaptivity and variational stabilization for
convection–diffusion equations, ESAIM Math. Model. Numer. Anal. 46, 1247–1273.
R. Courant and K. O. Friedrichs (1948), Supersonic Flow and Shock Waves, Interscience.
L. Demkowicz (2018), Lecture notes on energy spaces. Report 18-13, Oden Institute, The
University of Texas at Austin.
L. Demkowicz (2024), Mathematical Theory of Finite Elements, Computational Science
and Engineering, SIAM.
L. Demkowicz and J. Gopalakrishnan (2010), A class of discontinuous Petrov–Galerkin
methods, Part I: The transport equation, Comput. Methods Appl. Mech. Engrg 199,
1558–1572.

L. Demkowicz and J. Gopalakrishnan (2011a), Analysis of the DPG method for the Poisson
equation, SIAM J. Numer. Anal. 49, 1788–1809.
L. Demkowicz and J. Gopalakrishnan (2011b), A class of discontinuous Petrov–Galerkin
methods, Part II: Optimal test functions, Numer. Methods Partial Differential Equations
27, 70–105.
L. Demkowicz and J. Gopalakrishnan (2013), A primal DPG method without a first-order
reformulation, Comput. Math. Appl. 66, 1058–1064.
L. Demkowicz and J. Gopalakrishnan (2017), Discontinuous Petrov Galerkin (DPG)
method, in Encyclopedia of Computational Mechanics, second edition, Wiley Com-
putational Mechanics Online.
L. Demkowicz and N. Heuer (2013), Robust DPG method for convection-dominated dif-
fusion problems, SIAM J. Numer. Anal. 51, 2514–2537.
L. Demkowicz and J. T. Oden (1986a), An adaptive characteristic Petrov–Galerkin finite
element method for convection-dominated linear and nonlinear parabolic problems in
one space variable, J. Comput. Phys. 67, 188–213.
L. Demkowicz and J. T. Oden (1986b), An adaptive characteristic Petrov–Galerkin finite
element method for convection-dominated linear and nonlinear parabolic problems in
two space variables, Comput. Methods Appl. Mech. Engrg 55, 63–87.
L. Demkowicz and P. Zanotti (2020), Construction of DPG Fortin operators revisited,
Comput. Math. Appl. 80, 2261–2271.
L. Demkowicz, J. Gopalakrishnan and B. Keith (2020), The DPG-star method, Comput.
Math. Appl. 79, 3092–3116.
L. Demkowicz, J. Gopalakrishnan and A. Niemi (2012a), A class of discontinuous Petrov–
Galerkin methods, Part III: Adaptivity, Appl. Numer. Math. 62, 396–427.
L. Demkowicz, J. Gopalakrishnan, I. Muga and J. Zitelli (2012b), Wavenumber explicit
analysis for a DPG method for the multidimensional Helmholtz equation, Comput.
Methods Appl. Mech. Engrg 213/216, 126–138.
L. Demkowicz, J. Gopalakrishnan, S. Nagaraj and P. Sepúlveda (2017), A spacetime DPG
method for the Schrödinger equation, SIAM J. Numer. Anal. 55, 1740–1759.
L. Diening and J. Storn (2022), A space-time DPG method for the heat equation, Comput.
Math. Appl. 105, 41–53.
I. Ekeland and R. Témam (1999), Convex Analysis and Variational Problems, Vol. 28 of
Classics in Applied Mathematics, SIAM.
T. Ellis, J. Chan and L. Demkowicz (2016), Robust DPG methods for transient convection–
diffusion, in Building Bridges: Connections and Challenges in Modern Approaches to
Numerical Partial Differential Equations (G. R. Barrenechea et al., eds), Vol. 114 of
Lecture Notes in Computational Science and Engineering, Springer.
T. Ellis, L. Demkowicz and J. Chan (2014), Locally conservative discontinuous Petrov–
Galerkin finite elements for fluid problems, Comput. Math. Appl. 68, 1530–1549.
A. Ern and J.-L. Guermond (2021), Finite Elements II, Springer.
A. Ern, J.-L. Guermond and G. Caplain (2007), An intrinsic criterion for the bijectivity of
Hilbert operators related to Friedrichs’ systems, Commun. Partial Differential Equations
32, 317–341.
J. Ernesti and C. Wieners (2019), A space-time discontinuous Petrov–Galerkin method for
acoustic waves, in Space-Time Methods: Applications to Partial Differential Equations
(U. Langer and O. Steinbach, eds), De Gruyter, pp. 89–116.

K. O. Friedrichs (1958), Symmetric positive linear differential equations, Commun. Pure

Appl. Math. 11, 333–418.
T. Führer (2018), Superconvergence in a DPG method for an ultra-weak formulation,
Comput. Math. Appl. 75, 1705–1718.
T. Führer and N. Heuer (2017), Robust coupling of DPG and BEM for a singularly perturbed
transmission problem, Comput. Math. Appl. 74, 1940–1954.
T. Führer and N. Heuer (2019), Fully discrete DPG methods for the Kirchhoff–Love plate
bending model, Comput. Methods Appl. Mech. Engrg 343, 550–571.
T. Führer and N. Heuer (2024), Robust DPG test spaces and Fortin operators: The 𝐻 1 and
𝐻(div) cases, SIAM J. Numer. Anal. 62, 718–748.
T. Führer, R. González and M. Karkulik (2024), Well-posedness of first-order acoustic wave
equations and space-time finite element approximation. Available at arXiv:2311.10536.
T. Führer, N. Heuer and M. Karkulik (2017), On the coupling of DPG and BEM, Math.
Comp. 86, 2261–2284.
T. Führer, N. Heuer and A. H. Niemi (2019), An ultraweak formulation of the Kirchhoff–
Love plate bending model and DPG approximation, Math. Comp. 88, 1587–1619.
T. Führer, N. Heuer and A. H. Niemi (2022), A DPG method for shallow shells, Numer.
Math. 152, 76–99.
T. Führer, N. Heuer and A. H. Niemi (2023), A DPG method for Reissner–Mindlin plates,
SIAM J. Numer. Anal. 61, 995–1017.
T. Führer, N. Heuer and F.-J. Sayas (2020), An ultraweak formulation of the Reissner–
Mindlin plate bending model and DPG approximation, Numer. Math. 145, 313–344.
T. Führer, N. Heuer and E. P. Stephan (2018a), On the DPG method for Signorini problems,
IMA J. Numer. Anal. 38, 1893–1926.
T. Führer, N. Heuer, M. Karkulik and R. Rodrı́guez (2018b), Combining the DPG method
with finite elements, Comput. Methods Appl. Math. 18, 639–652.
V. V. Garg, S. Prudhomme, K. G. van der Zee and G. F. Carey (2014), Adjoint-consistent
formulations of slip models for coupled electroosmotic flow systems, Adv. Model. Simul.
Engrg 2, art. 15.
J. Gopalakrishnan (2013), Five lectures on DPG methods. Available at arXiv:1306.0557.
J. Gopalakrishnan and W. Qiu (2014), An analysis of the practical DPG method, Math.
Comput. 83, 537–552.
J. Gopalakrishnan and P. Sepúlveda (2019), A spacetime DPG method for acoustic waves,
in Space-Time Methods: Applications to Partial Differential Equations (U. Langer and
O. Steinbach, eds), Radon Series on Computational and Applied Mathematics, De
Gruyter, pp. 129–154.
J. Gopalakrishnan, L. Grubišić, J. Ovall and B. Q. Parker (2020), Analysis of FEAST
spectral approximations using the DPG discretization, Comput. Methods Appl. Math.
89, 203–228.
J. Gopalakrishnan, I. Muga and N. Olivares (2014), Dispersive and dissipative errors in the
DPG method with scaled norms for the Helmholtz equation, SIAM J. Sci. Comput. 36,
A20–A39.
P. Grisvard (1985), Elliptic Problems in Nonsmooth Domains, Vol. 24 of Monographs and
Studies in Mathematics, Pitman Advanced Publishing Program.
N. Heuer and M. Karkulik (2015), DPG method with optimal test functions for a transmis-
sion problem, Comput. Math. Appl. 70, 1504–1518.

N. Heuer and M. Karkulik (2017a), Discontinuous Petrov–Galerkin boundary elements,

Numer. Math. 135, 1011–1043.
N. Heuer and M. Karkulik (2017b), A robust DPG method for singularly perturbed reaction–
diffusion problems, SIAM J. Numer. Anal. 55, 1218–1242.
M. Jensen (2004), Discontinuous Galerkin methods for Friedrichs systems with irregular
solutions. PhD thesis, University of Oxford.
T. Kato (1995), Perturbation Theory for Linear Operators, Classics in Mathematics,
Springer.
B. Keith (2018), New ideas in adjoint methods for PDEs: A saddle-point paradigm for
finite element analysis and its role in the DPG methodology. PhD thesis, The University
of Texas at Austin.
B. Keith, A. V. Astaneh and L. Demkowicz (2019), Goal-oriented adaptive mesh refinement
for discontinuous Petrov–Galerkin methods, SIAM J. Numer. Anal. 57, 1649–1676.
E. Kharazmi, Z. Zhang and G. E. M. Karniadakis (2021), hp-VPINNs: Variational physics-
informed neural networks with domain decomposition, Comput. Methods Appl. Mech.
Engrg 374, art. 113547.
J. Li (2024), A nonlinear mixed problem framework for discontinuous Petrov Galerkin
(DPG) methods. PhD thesis, The University of Texas at Austin.
J. Li and L. Demkowicz (2024), A DPG method for planar div-curl problems, Comput.
Math. Appl. 159, 31–43.
A. F. D. Loula and D. T. Fernandes (2009), A quasi optimal Petrov–Galerkin method for
Helmholtz problem, Internat. J. Numer. Methods Engrg 80, 1595–1622.
A. F. D. Loula, T. J. R. Hughes and L. P. Franca (1987), Petrov–Galerkin formulations of
the Timoshenko beam problem, Comput. Methods Appl. Mech. Engrg 63, 115–132.
J. M. Melenk (1995), On generalized finite element methods. PhD thesis, University of
Maryland.
F. Millar, I. Muga, S. Rojas and K. G. Van der Zee (2022), Projection in negative norms
and the regularization of rough linear functionals, Numer. Math. 150, 1087–1121.
P. Monk (2003), Finite Element Methods for Maxwell’s Equations, Numerical Mathematics
and Scientific Computation, Oxford University Press.
I. Muga and K. G. van der Zee (2020), Discretization of linear problems in Banach spaces:
Residual minimization, nonlinear Petrov–Galerkin, and monotone mixed methods, SIAM
J. Numer. Anal. 58, 3406–3426.
J. Muñoz-Matute and L. Demkowicz (2024), Multistage discontinuous Petrov–Galerkin
time-marching scheme for nonlinear problems, SIAM J. Numer. Anal. 62, 1956–1978.
J. Muñoz-Matute, L. Demkowicz and D. Pardo (2022), Error representation of the time-
marching DPG scheme, Comput. Methods Appl. Mech. Engrg 391, art. 114480.
J. Muñoz-Matute, D. Pardo and L. Demkowicz (2021), Equivalence between the DPG
method and the exponential integrators for linear parabolic problems, J. Comput. Phys.
429, art. 110016.
J. Nečas (1962), Sur une méthode pour résoudre les équations aux dérivées partielles du
type elliptique, voisine de la variationnelle, Ann. Sc. Norm. Super. Pisa Cl. Sci. 16,
305–326.
J.-C. Nédélec (1980), Mixed finite elements in R3 , Numer. Math. 35, 315–341.
J. Niesen and W. M. Wright (2012), Algorithm 919: A Krylov subspace algorithm for
evaluating the 𝜙-functions appearing in exponential integrators, ACM Trans. Math.
Softw.

J. Nocedal and S. J. Wright (2006), Numerical Optimization, second edition, Springer.

S. Petrides and L. Demkowicz (2017), An adaptive DPG method for high frequency time-
harmonic wave propagation problems, Comput. Math. Appl. 74, 1999–2017.
P.-A. Raviart and J. M. Thomas (1977a), A mixed finite element method for 2-nd order
elliptic problems, in Mathematical Aspects of Finite Element Methods, Vol. 606 of
Lecture Notes in Mathematics, Springer, pp. 292–315.
P.-A. Raviart and J. M. Thomas (1977b), Primal hybrid finite element methods for 2nd
order elliptic equations, Math. Comp. 31, 391–413.
N. V. Roberts, T. Bui-Thanh and L. Demkowicz (2014), The DPG method for the Stokes
problem, Comput. Math. Appl. 67, 966–995.
N. V. Roberts, L. Demkowicz and R. Moser (2015), A discontinuous Petrov–Galerkin
methodology for adaptive solutions to the incompressible Navier–Stokes equations,
J. Comput. Phys. 301, 456–483.
S. Rojas, P. Maczuga, J. Muñoz-Matute, D. Pardo and M. Paszyński (2024), Robust
variational physics-informed neural networks, Comput. Methods Appl. Mech. Engrg
425, art. 116904.
P. Sepúlveda (2018), Spacetime numerical techniques for the wave and Schrödinger equa-
tions. PhD thesis, Portland State University.
D. Sheen (1992), A generalized Green’s theorem, Appl. Math. Lett. 5, 95–98.
E. Valseth and C. Dawson (2022), A stable space-time FE method for the shallow water
equations, Comput. Geosci. pp. 1–18.
E. Valseth, A. Romkes and A. R. Kaul (2021), A stable FE method for the space-time
solution of the Cahn–Hilliard equation, J. Comput. Phys. 441, art. 110426.
A. Vaziri Astaneh, F. Fuentes, J. Mora and L. Demkowicz (2018), High-order polygonal
discontinuous Petrov–Galerkin (PolyDPG) methods using ultraweak formulations, Com-
put. Methods Appl. Mech. Engrg 332, 686–711.
J. Xu and L. Zikatanov (2003), Some observations on Babuška and Brezzi theories, Numer.
Math. 94, 195–202.
J. Zitelli, I. Muga, L. Demkowicz, J. Gopalakrishnan, D. Pardo and V. Calo (2011), A class
of discontinuous Petrov–Galerkin methods, Part IV: Wave propagation, J. Comput.
Phys. 230, 2406–2432.

1 s2.0 S0045782517306229 Main
No ratings yet
1 s2.0 S0045782517306229 Main
30 pages
Ticam Report 00-27 October 17, 2000: A Priori Error Estimation For Discontinuous Galerkin Methods
No ratings yet
Ticam Report 00-27 October 17, 2000: A Priori Error Estimation For Discontinuous Galerkin Methods
43 pages
Solving Underdetermined Nonlinear Equations by Newton-Like Method
No ratings yet
Solving Underdetermined Nonlinear Equations by Newton-Like Method
22 pages
Local Discontinuous Galerkin Methods For High-Order Time-Dependent Partial Differential Equations
No ratings yet
Local Discontinuous Galerkin Methods For High-Order Time-Dependent Partial Differential Equations
57 pages
2022 Apost HDG Hypercircle Dina Et Al AJM
No ratings yet
2022 Apost HDG Hypercircle Dina Et Al AJM
20 pages
Dual-Norm Least-Squares Finite Element Methods For Hyperbolic Problems
No ratings yet
Dual-Norm Least-Squares Finite Element Methods For Hyperbolic Problems
158 pages
Introduction To Galerkin Methods
No ratings yet
Introduction To Galerkin Methods
22 pages
A Projection-Based Error Analysis of HDG Methods
No ratings yet
A Projection-Based Error Analysis of HDG Methods
17 pages
RKDG - WENO Limiter Report
No ratings yet
RKDG - WENO Limiter Report
7 pages
Discontinuous Galerkin Method for Front Propagation
No ratings yet
Discontinuous Galerkin Method for Front Propagation
17 pages
Galerkin Methods
100% (1)
Galerkin Methods
7 pages
Paper 15
No ratings yet
Paper 15
10 pages
Weighted Residual Method Weighted Residual Method
No ratings yet
Weighted Residual Method Weighted Residual Method
20 pages
Gmres Fom Versus QMR Bicg
No ratings yet
Gmres Fom Versus QMR Bicg
24 pages
HDG 1
No ratings yet
HDG 1
26 pages
FEM: Galerkin & Rayleigh-Ritz Methods
No ratings yet
FEM: Galerkin & Rayleigh-Ritz Methods
15 pages
Discontinuous Galerkin Method: Formulation and Shape Function
No ratings yet
Discontinuous Galerkin Method: Formulation and Shape Function
23 pages
10 1137@S003614290037174X
No ratings yet
10 1137@S003614290037174X
30 pages
Iterative Methods For Solving Linear Systems
No ratings yet
Iterative Methods For Solving Linear Systems
237 pages
Spectral Methods For Di Erential Equations
No ratings yet
Spectral Methods For Di Erential Equations
10 pages
DG Slides
No ratings yet
DG Slides
14 pages
Dewitte 2010
No ratings yet
Dewitte 2010
17 pages
Inexact Newton Method For Minimization of Convex P
No ratings yet
Inexact Newton Method For Minimization of Convex P
16 pages
Finite Element Methods For The Numerical Solutions of Partial Differential Equations - 2019
No ratings yet
Finite Element Methods For The Numerical Solutions of Partial Differential Equations - 2019
232 pages
FEM802 - L7 Approx Solutions To de
No ratings yet
FEM802 - L7 Approx Solutions To de
18 pages
Err LDG Poisson
No ratings yet
Err LDG Poisson
32 pages
G Fairweather ComputationalStudyFinite-1983
No ratings yet
G Fairweather ComputationalStudyFinite-1983
21 pages
Discontinuous Galerkin Finite Element Method For The Wave Equation
No ratings yet
Discontinuous Galerkin Finite Element Method For The Wave Equation
25 pages
Discontinuous Galerkin Methods For Solving Elliptic and Parabolic Equations
No ratings yet
Discontinuous Galerkin Methods For Solving Elliptic and Parabolic Equations
212 pages
Galerkin Methods
No ratings yet
Galerkin Methods
29 pages
A Stable Petrov-Galerkin Method For Convection-Dominated Problems
No ratings yet
A Stable Petrov-Galerkin Method For Convection-Dominated Problems
14 pages
Solving Ordinary Differential Equations II Stiff A
No ratings yet
Solving Ordinary Differential Equations II Stiff A
9 pages
MAT321 Lecture Notes Boumal 2019
No ratings yet
MAT321 Lecture Notes Boumal 2019
203 pages
The New Iteration Methods For Solving Absolute Value Equations
No ratings yet
The New Iteration Methods For Solving Absolute Value Equations
14 pages
Spectral Element Method for Nonlinear Fredholm Equations
No ratings yet
Spectral Element Method for Nonlinear Fredholm Equations
9 pages
A Limit-Cycle Solver For Nonautonomous Dynamical Systems
No ratings yet
A Limit-Cycle Solver For Nonautonomous Dynamical Systems
5 pages
Is Positive Definite.: 7.6 The Conjugate Gradient Method Assumption: Definition: Inner Production of Vectors
No ratings yet
Is Positive Definite.: 7.6 The Conjugate Gradient Method Assumption: Definition: Inner Production of Vectors
6 pages
(Yousef Saad) Iterative Methods For Sparse Linear (BookFi)
No ratings yet
(Yousef Saad) Iterative Methods For Sparse Linear (BookFi)
547 pages
Galerkin Method
100% (1)
Galerkin Method
24 pages
Comp Numerical Analysis Problems
No ratings yet
Comp Numerical Analysis Problems
7 pages
Published Paper On NA 9619680 PDF
No ratings yet
Published Paper On NA 9619680 PDF
13 pages
Appendix Pde
No ratings yet
Appendix Pde
25 pages
Iterative Methods For Linear Systems: Course Website
No ratings yet
Iterative Methods For Linear Systems: Course Website
24 pages
Lecture Notes On Finite Element Methods For Partial Differential Equations
No ratings yet
Lecture Notes On Finite Element Methods For Partial Differential Equations
106 pages
Analytical Approximate Solutions For Quadratic Riccati Differential
No ratings yet
Analytical Approximate Solutions For Quadratic Riccati Differential
7 pages
Extrapolated Gauss-Seidel Method
No ratings yet
Extrapolated Gauss-Seidel Method
13 pages
Brents Method
No ratings yet
Brents Method
12 pages
Iterative Methods for Linear Systems
No ratings yet
Iterative Methods for Linear Systems
4 pages
Extended and Modified Halley ' S Iterative Method For Solving Non Linear Equations
No ratings yet
Extended and Modified Halley ' S Iterative Method For Solving Non Linear Equations
10 pages
Cuyt, Rall - 1985 - Computational Implementation of The Multivariate Halley Method For Solving Nonlinear Systems of Equations
No ratings yet
Cuyt, Rall - 1985 - Computational Implementation of The Multivariate Halley Method For Solving Nonlinear Systems of Equations
18 pages
Chap 3 Weighted Residual and Energy Method For 1D Problems: Finite Element Analysis and Design Nam-Ho Kim
No ratings yet
Chap 3 Weighted Residual and Energy Method For 1D Problems: Finite Element Analysis and Design Nam-Ho Kim
47 pages
Numerical Solutions of Fredholm Integral Equation of Second Kind Using Piecewise Bernoulli Polynomials
No ratings yet
Numerical Solutions of Fredholm Integral Equation of Second Kind Using Piecewise Bernoulli Polynomials
9 pages
Numpde
No ratings yet
Numpde
194 pages
Directional Secant Method For Nonlinear Equations: Heng-Bin An, Zhong-Zhi Bai
No ratings yet
Directional Secant Method For Nonlinear Equations: Heng-Bin An, Zhong-Zhi Bai
14 pages
s00125 023 05894 8. - EASD - PDF
No ratings yet
s00125 023 05894 8. - EASD - PDF
21 pages
Analog Circuit Design Course
No ratings yet
Analog Circuit Design Course
2 pages
FFL - Unit 2-Food Allergies Guided Notes
No ratings yet
FFL - Unit 2-Food Allergies Guided Notes
3 pages
V3 Unit 9B - Human Regulation and Reproduction
No ratings yet
V3 Unit 9B - Human Regulation and Reproduction
16 pages
HA Group 7
No ratings yet
HA Group 7
4 pages
Super 1100 Original IIM IPM Interview Questions Percentile Classes
No ratings yet
Super 1100 Original IIM IPM Interview Questions Percentile Classes
123 pages
AnnualReport2004 05
No ratings yet
AnnualReport2004 05
315 pages
Ics Ot Security Cyber Attacks
No ratings yet
Ics Ot Security Cyber Attacks
4 pages
Electrolysis Processes
100% (3)
Electrolysis Processes
180 pages
Mowafaqand Aveen
No ratings yet
Mowafaqand Aveen
11 pages
Practice Questions Chs 21-24 1
0% (1)
Practice Questions Chs 21-24 1
40 pages
English 10 Summative Test Guide
No ratings yet
English 10 Summative Test Guide
3 pages
FinalReport Sample
No ratings yet
FinalReport Sample
15 pages
Sampling and Analysis Plan/Quality Assurance Project Plan Nature and Extent of LA Contamination in The Forest Libby Asbestos Site, Operable Unit 4
No ratings yet
Sampling and Analysis Plan/Quality Assurance Project Plan Nature and Extent of LA Contamination in The Forest Libby Asbestos Site, Operable Unit 4
87 pages
Conversion From Nm3/h To KG/H For Hydrogen Gas From An Electrolyzer
100% (4)
Conversion From Nm3/h To KG/H For Hydrogen Gas From An Electrolyzer
1 page
UL Vs EN
100% (2)
UL Vs EN
3 pages
LAS 3 - Solution Process
No ratings yet
LAS 3 - Solution Process
2 pages
Nvent ERICO Nvent ERICO Hammerlock Ground Clamp
No ratings yet
Nvent ERICO Nvent ERICO Hammerlock Ground Clamp
3 pages
IMSO 2022 Mathematics Essay Problems
No ratings yet
IMSO 2022 Mathematics Essay Problems
14 pages
Inquiry Worksheet-3
No ratings yet
Inquiry Worksheet-3
2 pages
RTU Electrical Engineering 3rd Sem Syllabus
No ratings yet
RTU Electrical Engineering 3rd Sem Syllabus
2 pages
Electrical Distribution Board Specs
No ratings yet
Electrical Distribution Board Specs
7 pages
Class 12 Economics Test 1 Demand and Elasticity of Demand
No ratings yet
Class 12 Economics Test 1 Demand and Elasticity of Demand
3 pages
Vocabulary and Grammar
No ratings yet
Vocabulary and Grammar
38 pages
Draft Policy for Private Freight Terminals
No ratings yet
Draft Policy for Private Freight Terminals
14 pages
RMKEC: Premier Engineering Excellence
No ratings yet
RMKEC: Premier Engineering Excellence
2 pages
2016 Manufacturing SupplyChain Logistics TransportationManagement Trends PDF
No ratings yet
2016 Manufacturing SupplyChain Logistics TransportationManagement Trends PDF
65 pages
Introduction To Nuclear Power 2nd Edition by Hewitt, John Collier ISBN B0087PBA54 9781560324546 PDF Download
100% (8)
Introduction To Nuclear Power 2nd Edition by Hewitt, John Collier ISBN B0087PBA54 9781560324546 PDF Download
72 pages
True Riches
100% (1)
True Riches
99 pages
Procedure For Painting For Offshore - CX Category
No ratings yet
Procedure For Painting For Offshore - CX Category
6 pages