0% found this document useful (0 votes)
9 views

A Modern C++ Point of View of Programming in Image Processing - 2022

The document discusses the advantages of using Modern C++ for image processing, highlighting its multi-paradigm nature that allows for efficient algorithm development. It emphasizes the role of generic programming in creating reusable software components for various image types, while also addressing the challenges of balancing genericity, efficiency, and simplicity. The authors focus on C++20 features such as ranges, views, and concepts to enhance the development of generic image algorithms and improve performance.

Uploaded by

xanharchive0115
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

A Modern C++ Point of View of Programming in Image Processing - 2022

The document discusses the advantages of using Modern C++ for image processing, highlighting its multi-paradigm nature that allows for efficient algorithm development. It emphasizes the role of generic programming in creating reusable software components for various image types, while also addressing the challenges of balancing genericity, efficiency, and simplicity. The authors focus on C++20 features such as ranges, views, and concepts to enhance the development of generic image algorithms and improve performance.

Uploaded by

xanharchive0115
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

A Modern C++ Point of View of Programming in

Image Processing
Michaël Roynard, Edwin Carlinet, Thierry Géraud

To cite this version:


Michaël Roynard, Edwin Carlinet, Thierry Géraud. A Modern C++ Point of View of Programming
in Image Processing. 2022. �hal-03564252�

HAL Id: hal-03564252


https://hal.archives-ouvertes.fr/hal-03564252
Preprint submitted on 10 Feb 2022

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est


archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
A Modern C++ Point of View of Programming in Image
Processing
Michaël Roynard
[email protected]

Edwin Carlinet
[email protected]

Thierry Géraud
[email protected]

EPITA Research and Development Laboratory (LRDE)


14-16 rue Voltaire,
94270, Le Kremlin-Bicêtre, France

February 10, 2022

1
Abstract
C++ is a multi-paradigm language that en-
ables the programmer to set up efficient im-
age processing algorithms easily. This language
strength comes from many aspects. C++ is
high-level, so this enables developing power-
ful abstractions and mixing different program-
ming styles to ease the development. At the
same time, C++ is low-level and can fully
take advantage of the hardware to deliver the Figure 1: The watershed segmentation algo-
best performance. It is also very portable and rithm runs on a 2D-regular grayscale image
highly compatible which allows algorithms to (left), on a vertex-valued graph (middle) and
be called from high-level, fast-prototyping lan- on a 3D mesh (right).
guages such as Python or Matlab. One of the
guage which makes C++ easily interoperable with
most fundamental aspects where C++ really
high-level prototyping languages. This is why the
shines is generic programming. Generic pro- performance-sensitive features of many image pro-
gramming makes it possible to develop and cessing libraries (and numerical libraries in gen-
reuse bricks of software on objects (images) of eral) are implemented in C++ (or C/Fortran as
different natures (types) without performance in OpenCV [7], IPP [11]) or with a hardware-
loss. Nevertheless, conciliating genericity, effi- dedicated language (e.g. CUDA [8]) and are ex-
ciency, and simplicity at the same time is not posed through a high-level API to Python, LUA. . .
trivial. Modern C++ (post-2011) has brought Apart from the performance considerations, the
new features that made it simpler and more problem lies in that each image processing field
comes with its own set of image type to process.
powerful. In this paper, we will focus in par-
Obviously, the most common image type is an im-
ticular on some C++20 aspects of generic pro- age of RGB or gray-level values, encoded with 8-
gramming: ranges, views, and concepts, and bits per channel, on a regular 2D rectangular do-
see how they extend to images to ease the de- main that covers 90% of common usages. How-
velopment of generic image algorithms while ever, with the development of new devices has come
lowering the computation time. new image types: 3D multi-band images in Medi-
Keywords — Image processing, Generic Pro- cal Imaging, hyperspectral images in Astronomi-
gramming, Modern C++, Software, Performance cal Imaging, images with complex values in Signal
Processing. . . Some devices generate images with a
depth channel which is encoded with a number of
1 Introduction bits different from the other channels. . . An image
processing library able to handle those images type
C++ claims to “leave no room for a lower-level lan- would cover 99% of use cases. Finally, the remain-
guage (except assembler)” [38] which makes it a ing 1% would cover the usage of esoteric image
go-to language when developing high-performance types.
computing (HPC) image processing applications. In Digital Topology, we have to deal with non-
The language is designed after a zero-overhead regular domain where pixels are not regular pixels.
abstraction principle that allows us to devise a They might be super-pixels produced by a segmen-
high-level but efficient solution to image process- tation algorithm, hexagonal pixels, pixels defined
ing problems. Others aspects of C++ are its sta- on some special grids (e.g. the cairo pattern [19]) or
bility, its portability on a wide range of archi- even meshes’ vertices. In Mathematical Morphol-
tectures, and its direct interface with the C lan- ogy, most image operators are defined on a graph

2
void dilate_rect(image2d_u8 in, image2d_u8 out, int w, int h) { SE
for (int y = 0; y < out.height(); ++y) Possible uses of the dila-
for (int x = 0; x < out.width(); ++x) { tion with a square SE.
Square
uint8_t s = 0;
for (int qy = y - h/2; qy <= y + h/2; ++y)
for (int qx = x - w/2; qx <= x + w/2; ++x)
if (0 <= qy <= in.height() && 0 <= qx <= in.width())
s = max(s, input(qx, qy)); Diamond
out(x,y) = s;
} Ball
} 2D-buffer 3D-buffer graph

Structure
Figure 2: Non-generic dilation algorithm for 8- 16-bits int

bits grayscale 2D-images by a rectangle. double

8-bits RGB
framework and are naturally extended to a hierar-
chical representation of the image (e.g. operators Values
on hierarchies of segmentation [28], trees [12] or a
shape space [44]). The fact that image processing Figure 3: The combinatorial set of inputs that
is related to many fields has already led Järvi to a dilation operator may handle.
wonder about how they can easily adapt types to
fit different image formalism [22]. “genericity” in order to write a generic version of an
From a programming standpoint, the ability to algorithm.
run the same algorithm (code) over a different set of With Ad-hoc polymorphism (A), one has to write
image types, as shown in fig. 1, is called genericity. one implementation for each image type which in-
This term was defined by Musser in [30] as follows: volves code duplication to be exhaustive. The abil-
“By generic programming we mean the definition ity to select which implementation will run is based
of algorithms and data structures at an abstract on the “real” type of the image. In C++, if this in-
or generic level, thereby accomplishing many re- formation is known at compile time (static), the
lated programming tasks simultaneously. The cen- compiler selects the right implementation by it-
tral notion is that of generic algorithms, which self (static dispatch by overload resolution). If the
are parametrized procedural schemata that are com- “real” type of the image is known dynamically, one
pletely independent of the underlying data represen- has to select the correct implementation by hand
tation and are derived from concrete, efficient algo- by writing boilerplate code.
rithms.” To illustrate our point, we will consider a With Generalization (B), one has to consider a
simple yet complex enough image operation: the common type for all images (let us name it super-
dilation of an image f by a flat structuring element type) and write algorithms for this common type.
(SE) B defined as It implies conversion back and forth between the
super-type and other image types for every compu-
_ tation.
g(x) = f (y) (1)
With Inclusion Polymorphism, Dynamic Traits
y∈Bx
(C), one has to define an abstract type featuring
Simply said, it consists in taking the supremum all common image operations. For example, one
of the values in region B centered in x. Despite may consider that all images must define an opera-
the apparent simplicity, this operator allows a high tor get_value(Point p) -> Any where Point is a
variability of the inputs. f can be a regular 2D im- type able to contain any point value (2d, 3d, graph
age as well as a graph; values can be grayscale as vertex. . . ) and Any a type able to hold any value.
well as colors; the SE can be rectangle as well as a This is generally achieved using inclusion polymor-
disc adaptive to the local content. . . The straight- phism in Object-Oriented Programming with an
forward implementation in fig. 2 covers only one interface and/or abstract type AbstractImage for
possible set of parameters: the dilation of 8-bits all image types. It may also be achieved using
grayscale 2D-images by a rectangle. The combina- more modern techniques such as type-erasure with
torial set of parameters increases drastically with a type AnyImage (that has the same interface as
the types of the inputs as seen in fig. 3. In [34], the AbstractImage) for which any image could be con-
authors depict four different approaches to leverage verted to. Whatever technique used behind the

3
template <Range R> template <typename T>
scene relies on a dynamic dispatch at runtime to requires MaxMonoid<value_t<R>> concept MaxMonoid =
auto maxof(R col) { requires(T x) {
resolve which interface method is called. value_t<R> s = 0; { T v = 0; };
Parametric Polymorphism, Generics, Static for (auto e : col)
s = max(s, e); }
{ x = max(x, x); };

Traits (D) somewhat relates to the same concept return s;


}
of Inclusion Polymorphism; one also has to define
an abstraction for the handled images. However, Figure 4: A generic concept-checked sum algo-
the main difference lies in the dispatch which is rithm over a collection.
static and has the best performance. The com-
piler writes a new specialized version for each input through ranges [31] and concepts [16]. The contri-
image type by itself thanks to template algorithm. butions of this paper are two-fold. First, revisiting
C++ generic programming will be reviewed more the definitions of images and algorithms to extend
in-depth in section 2. the range views to images. In particular, we enable
Most libraries do not fall into a single cate- mixing types and algorithms in some new types that
gory but mix different techniques. For instance, are composable. Second, we show that it yields
CImg [39] mixes (B) and (D) by considering only performance boost while preserving usability that
4D-images parametrized by their value type. In could benefit libraries relying on the (D) approach.
OpenCV [7], algorithms take generalized input The paper is organized as follows. In section 2,
types (C) but dispatch dynamically and manually we review some basics of generic programming and
on the value type (A) to get a concrete type and explain how the authors leverages C++20’s con-
call a generic algorithm (D). Scikit-image [40] re- cepts to abstract image types by designing a generic
lies on Scipy [23] that has a C-style object dy- framework. In section 3, we present C++20’s
namic abstraction of nd-arrays and iterators (C) ranges, in particular range views, and we contribute
and sometimes dispatch by hand to the most spe- by extending this design by applying it to images.
cialized algorithm based on the element type (A). We also discuss and compare our contribution with
Many libraries have chosen the (D) option with a state-of-the-art solution that may seem similar to
certain level of genericity (Boost GIL [6], Vigra [25], ours in section 4. Eventually, in section 5, we val-
GrAL [4], DGTal [14], Higra [32], and Pylene [13]). idate the performance gain on a real-case bench-
The table comparing all the pros and cons from mark.
the aforementioned approaches is presented in ta-
ble 1. We can see in this table that Generic Pro-
gramming in C++20 check all the boxes that we 2 Algebraic Properties of Im-
are interested in.
ages and Related Notions
Table 1: Genericity approaches: pros. & cons.
2.1 The Abstract Nature of Algo-
Paradigm TC CS E One IA EA
Code Duplication ✓ ✗ ✓ ✗ ✗ rithms
Code Generalization ✗ ≈ ≈ ✓ ✗
Object-Orientation ≈ ✓ ✗ ✓ ✓ Most algorithms are generic by nature as
Generic Programming: demonstrated in the Standard Template Library
with C++11 ✓ ≈ ✓ ✓ ≈ (STL) [36] when one has to work on a collection
with C++17 ✓ ✓ ✓ ✓ ≈
with C++20 ✓ ✓ ✓ ✓ ✓ of data. For example, let us consider the algo-
TC: type checking; CS: code simplicity; E: efficiency rithm maxof(Collection c) that gets the maximal
One IA: one implementation per algorithm; EA: explicit element of a collection (see fig. 4). It does not mat-
abstractions / constrained genericity ter whether the collection is actually implemented
with a linked-list, a contiguous buffer of elements
The recent advances in the C++ language [35] or whatever data structure. The only requirements
have eased the development of high-performance of this algorithm are: (1) we can iterate through
code and scientific libraries have taken advantages it; (2) the type of the elements is regular (i.e. be-
of these features [21, 29, 43]. The modern C++ haves the same way as a primitive type like int)
has brought generic programming to a higher level and forms a monoid with an associative operator

4
template <class I, class SE> template <class I>
void dilation(I in, I out, SE se) { concept Image = requires {
for (auto p : out.domain()) { point_t<I>; // Type of point (P)
value_t<I> s = min_of_v<value_t<I>>; value_t<I>; // Type of value (V)
for (auto q : se(p)) } && requires (I f, point_t<I> p, value_t<I> v) {
s = max(s, input(q)) { v = f(p) }; //
output(p) = s; { f(p) = v }; // optional, for output
} { f.domain() } -> Range; // (actually Range of P)
} };
template <class SE, class P>
concept StructuringElement =
Figure 5: Generic dilation algorithm. requires (SE se, P p) {
{ se(p) } -> Range; // (actually Range of P)
};
“max” and a neutral element “0”. Actually (1) is
template <Image I, class SE>
abstracted by pairs of iterators in the STL and void dilation(I input, I output, SE se)
ranges in C++20, while C++20 introduces con- requires MaxMonoid<value_t<I>>
&& StructuringElement<SE, point_t<I>>
cepts to check if a type follows the requirements of { ... }

the algorithm. The term "concept" is defined as


follows in [16]: “a set of axioms satisfied by a data Figure 6: Image and Structuring Element con-
type and a set of operations on it.” cept and constrained version of the dilation al-
While cataloging the image processing operators gorithm.
and algorithms, the authors could extract three
main families of algorithms. First are the point- let us consider the morphological dilation of an im-
wise algorithms which consists in traversing each age f : E → F (defined on a domain E with values
pixel one by one to perform an operation limited in F ) by a flat structuring element (SE) B (we note
to this pixel (e.g. filling the image with a value).Bx the SE centered in x). The dilation is defined as
Second are the local algorithm which consists in δf (x) = sup{f (y), y ∈ Bx }; the generic algorithm
traversing each pixel one by one to perform an op- is given in fig. 5. As one can see, the implementa-
eration that will consider a window of pixel around tion does not rely on a specific implementation of
this pixel. This window is defined by a structuring images. It could be 2D images, 3D images or even
element. Typical mathematical morphology algo- a graph (the SE could be the adjacency relation
graph).
rithms such as dilation, closing are part of that fam-
ily. Finally, are the global algorithms which consists The image requirements can be extracted from
in traversing each pixel one by one to perform an this algorithm. The image must provide a way to
operation which may need to consider all the pixels access its domain E which must be iterable. The
of the image at once, including the previous pix- structuring element must act as a function that re-
els in the traversing order which have already been turns a range of elements having the same type as
transformed. These algorithms are typically prop- the domain element (let us call them points of type
agating a transformation across the whole image. P ). Image has to provide a way to access the value
The chamfer distance transformation is a good ex- at a given point (f(x)) with x of type P. Last, as
ample of such an algorithm. in fig. 4, image values (of type V ) have to sup-
When addressing how to write a concept, one port max and have a neutral element “0”. It follows
the -simplified- Image concept and the constrained
should always refer to the following rule: “It is not
the types that define the concepts: it is the algo- dilation algorithm in fig. 6. Actually, the require-
rithms” [37]. Which means that being able to cata- ments for being an image are quite light. This pro-
log image processing algorithms mechanically leads vides versatility and allows us to pass non-regular
to the emergence of concepts related to image pro- “image” objects as inputs such as the image views
cessing. in section 3.
While C++20 provides all the tools necessary
2.2 Image Concept to properly define concepts as well as leveraging
them when implementing algorithms, it is still nec-
Most image processing algorithms are also essary to make the inventory of the algorithms fam-
generic [33, 26, 27] by nature. We saw in sec- ilies (explained in section 2.1) in order to actually
tion 2.1 that concepts emerges from pattern behav- extract the concepts related to image processing.
ior extracted from algorithms. Similarly to fig. 4, This extracting process is detailed more in-depth

5
template <Image I, class SE> // (1)
by the authors in [34]. We performed the image requires MaxMonoid<value_t<I>> &&
StructuringElement<SE, point_t<I>>
processing concept extraction and made it available void dilation(I input, I output, SE se)
alongside the image processing library Pylene [13]. { /* Generic impl. */ }

template <Image I, class SE> // (2)


requires MaxMonoid<value_t<I>> &&
DecomposableStructuringElement<SE, point_t<I>>
2.3 Genericity, Ease of use, Special- void dilation(I input, I output, SE se)
{ /* Decomposition-based impl. */ }
ization and Performance template <class V> // (3)
requires is_arithmetic_v<V>
It is often argued against generic programming that void dilation(buffer2d<V> in, buffer2d<V> out, vline2d se)
{ /* SIMD impl. of 1D version */ }
a single implementation cannot be performance op-
timal for every type. For example, the generic Figure 7: Dilation implementation specializa-
implementation of the dilation for n-dimensional tion based on compiletime predicates. (1) is
buffer images convert points into indices to access
the generic fall-back overload, (2) is selected
the data in the buffer while it could use indices di-
rectly if the data are contiguous in memory. We based on constraints ordering and concept re-
claim that this is not the problem of the generic finement of the structuring element; (3) is se-
programming paradigm as there exist several algo- lected based on the ordering rules for template
rithms for the same image operator. Performance specializations.
is the matter of an optimization process, i.e. ,
transforming or adapting the code into an equiva- of those items by sacrificing the third one. If one
lent code that performs better. Some optimizations wants to be generic and efficient, then the naive
are within the grasp of compilers, mostly low-level solution will be very complex to use with lots of
ones, while some high-level optimizations are just parameters. If one wants a solution to be generic
not reachable by compilers. The dilation operation and easy to use, then it will be not very efficient by
allows some drastic optimization based on the type default. If one wants a solution to be easy to use
of inputs; if the SE is decomposable, use a sequence and efficient then it will not be very generic. This
of dilations with simpler SEs; if the SE is a line then rule can be empirically verified by looking at exist-
use a dedicated O(n) 1D-algorithm [18]; if the data ing C++ libraries such as [5]. We assert that C++
is a contiguous buffer of basic types and the SE is concepts, used wisely as demonstrated previously
a line then use the 1D vertical dilation with vector enables to break through this rule. A piece of code
processing; if the extent of the SE is small then per- now can be generic, efficient and easy to use all at
form the dilation with a fixed-size mask. C++ GP the same time.
does not mean that a single implementation will
cover all these cases. It cannot as some of these de-
cisions depends on runtime conditions. However, it 3 Another View of Images for
aims at providing n algorithms to cover m combina- Genericity and Performance
tions of inputs with n ≪ m and ease the selection
of the best implementation based on compiletime 3.1 Ranges and Views in C++20
features of the inputs. Modern C++ has greatly
STL
eased the compiletime selection with concepts and
type properties as shown in fig. 7 mixing overload C++20 ranges [31] formalizes the concept of view,
selection with concept refinement and specialization extending the array views implemented in array-
ordering. Even if the third implementation is very manipulation libraries[42, 2], and transferable to
specific to some inputs, it is still generic enough the Image concept. In the STL, there is a distinc-
to cover all the native basic types (float, uint8, tion between the container owning the data buffer,
uint16. . . ) so that we do not have to duplicate the iterators related to traversing this container,
code for each of them. the range encapsulating the iterator pair allowing
Finally, it is often known that there is a rule traversing the container and the view which mu-
of three about genericity, performance and ease of tates the way the base range traverse the data it is
use. The rule states that one can only have two related to. All those abstraction levels need proper

6
(
0 if x < 150
auto h = [](int x)
255 if x ≥ 150
clip( , DiamondShape ROI ) →

auto u =
)
er
h nt
oi
(p
u [](int x) {
auto v = transform( u , h ) → ≡ filter( , return (x % 2) == 0; ) →
}

Figure 8: An image view performing a thresh-


Figure 9: Clip and filter image adaptors that
olding.
restrict the image domain by a non-regular ROI
refined design about data ownership, lifetime of dif- and by a predicate that selects only even pixels.
ferent object depending on what it refers to. For
Algorithm Composition = MyConplexOperator
instance, a range may not be cheap-to-copy as it
may contain data in order to prolong lifetime of Input Grayscale Sub-quantization
Dilation
Output
(RGB-16) Conversion (8-bit conversion) (Gray 8-bit)
the underlying object, for instance, extending the
lifetime of a temporary range in a pipe. Another Image Views Composition = MyComplexImage
issue related to range is the semantic of the con-
stness. Indeed, the standard has to define what it Figure 10: Example of a simple image process-
means for a range view to be const. Does it prop- ing pipeline illustrating the difference between
agate the constness to the underlying data or does the composition of algorithms and image views.
it impact the view capsule only?
In our design, all images have reference semantics is writable, with 8-bits integer values and has the
and cheap-to-copy. An image view, as a lightweight same domain as u1 . On the other hand, the projec-
object that acts like an image, models the Image tion h: (a, b) 7→ (a+b)/2, applied on images u1 and
concept. For example, it can be a random genera- u2 gives a read-only view that computes pixel-wise
tor image object which generates a value whenever the average of u1 and u2 .
f (p) is called, or an observer image that records Following the same principle, a view can apply
the number of times each pixel is accessed in order a restriction on an image domain. In fig. 9, we
to compare algorithms performance. In some pre show the adaptor clip(input, roi) that restricts
C++-11 libraries (e.g. the GIL [6] or Milena [27]), the image to a non-regular roi and filter(input,
image views were also present (named morphers predicate) that restricts the domain based on a
alongside the SCOOP pattern [10, 17]) but not predicate. All subsequent operations on those im-
compatible with modern C++ idioms (e.g. the ages will only affect the selected pixels.
range-based for loop) and not as well-developed as
in [31] however the idea remains the same and mod- 3.2 Views applied to image process-
ern C++ ease their development.
ing
Among image views, we give a partic-
ular focus on image adaptors. Let v = Views feature many interesting properties that
transform(u1 , u2 , · · · , un , h) where ui are input change the way we program an image processing
images and h a n-ary function. transform returns application. To illustrate those features, let us
an image generated (adapting) from other image(s) consider the following image processing pipeline:
as shown in fig. 8. An adaptor does not “own” data (Start) Load an input RGB-16 2D image (a classical
but records the transformation h and the pointer HDR photography) (A) Convert it in grayscale (B)
to the input images. The properties of the re- Sub-quantized to 8-bits (C) Perform the grayscale
sulting view depend on h. On the one hand, the dilation of the image (End) Save the resulting 2D
projection h: (r, g, b) 7→ g that selects the green 8-bits grayscale image; as described in fig. 10.
component of an RGB triplet gives a view v that Views are composable. One of the most im-

7
portant feature in a pipeline design (generally, in auto operator+(Image A, Image B) {
software engineering) is object composition. It en- }
return transform(A, B, std::plus<>());

ables composing simple blocks into complex ones. auto togray = [](Image A) { return transform(A, [](auto x)
{ return (x.r + x.g + x.b) / 3.f; };)
Those complex blocks can then be managed as if };
auto subquantize16to8b = [](Image A) { return transform(A,
they were still simple blocks. In fig. 10, we have [](float x) { return uint8_t(x / 256 +.5f); });
3 simple image operators Image → Image (the };

grayscale conversion, the sub-quantization, the di- auto input = imread(...);


auto MyComplexImage = subquantize16to8b(togray(A));
lation). As shown in fig. 10, algorithm composition
would consider these 3 simple operators as a sin-
gle complex operator Image → Image that could Figure 11: Using high-order primitive views to
then be used in another even more complex pro- create custom view operators.
cessing pipeline. Just like algorithms, image views
are composable, e.g. a view of the view of an im- but rather delays the computation until the ex-
age is still an image. In fig. 10, we compose the pression v(p) is invoked. Because views can be
input image with a grayscale transform view and a composed, the evaluation can be delayed quite far.
sub-quantization view that then feeds the dilation Image adaptors are template expressions [41, 42]
algorithm. as they record the expression used to generate the
Views improve usability. The code to com- image as a template parameter. A view actually
pose images in fig. 10 is almost as simple as: represents an expression tree (fig. 14).
auto input = imread(...);
auto A = transform(input, [](rgb16 x) -> float {
return (x.r + x.g + x.b) / 3.f; }; );
Views for performance. With a classical de-
auto MyComplexImage = transform(A, [](float x) sign, each operation of the pipeline is implemented
-> uint8_t { return (x / 256 + .5f); }; );
on “its own”. Each operation requires memory to be
People familiar with functional programming allocated for the output image and also, each op-
may notice similarities with these languages where eration requires that the image is fully traversed.
transform (map) and filter are sequence operators. This design is simple, flexible, composable, but
Views use the functional paradigm and are created is not memory efficient nor computation efficient.
by functions that take a function as argument: the With the lazy evaluation approach, the image is
operator or the predicate to apply for each pixel; traversed only once (when the dilation is applied)
we do not iterate by hand on the image pixels. that has two benefits. First, there are no interme-
Views improve re-usability. The code snip- diate images which is very memory effective. Sec-
pets above are simple but not very re-usable. ond, traversing the image is faster thanks to a bet-
However, following the functional programming ter memory cache usage. Indeed, in our example
paradigm, it is quite easy to define new views, be- (fig. 10), processing a RGB16 pixel from the dila-
cause some image adaptors can be considered as tion algorithm directly converts it in grayscale, then
high-order functions for which we can bind some sub-quantize it to 8-bits, and finally makes it avail-
parameters. In fig. 11, we show how the primitive able for the dilation algorithm. It acts as if we were
transform can be used to create a view summing writing an optimal operator that would combine
two images and a view operator performing the all these operations. This approach is somewhat
grayscale conversion as well as the sub-quantization related to the kernel-fusing operations available in
which can be reused afterward1 . some HPC specifications [24] but views-fusion is op-
timized by the C++ compiler only [9].
Views for lazy computing. Because the op-
eration is recorded within the image view, this new Views for productivity. All point-wise image
image type allows fundamental image types to be processing algorithms can (and should) be rewrit-
mixed with algorithms. In fig. 11, the creation of ten intuitively by using a one-liner view. The trans-
views does not involve any computation in itself form views is the key enabling that point. This im-
plies that there exist a new abstraction level avail-
1
These functions could have been written in a more able to the practitioner when prototyping their al-
generic way for more re-usability, but this is not the gorithm. The time spent implementing features
purpose here. is reduced, thus the feedback-loop time is reduced

8
too. This brings the practitioner to a productivity image level. The code has become more readable,
gain. more expressive and more efficient by default.
auto alphablend =
3.3 Reasoning at image level +
[](auto ima1, auto ima2, float alpha) {
return alpha * ima1 +
(1 - alpha) * ima2; };

void blend_inplace(const uint8_t* ima1, uint8_t* ima2, float alpha,


int width, int height, int stride1, int stride2) {
∗ ∗
for (int y = 0; y < height; ++y) {
const uint8_t* iptr = ima1 + y * stride1;
uint8_t* optr = ima2 + y * stride2; f alpha g 1 − alpha
for (int x = 0; x < width; ++x)
optr[x] = iptr[x] * alpha + optr[x] * (1-alpha);
}
} Figure 14: Alpha-blending, generic implemen-
tation with views, expression tree.
Figure 12: Alpha-blending with classical
C/C++ code. auto ima = blend(ima1, ima2, 0.2); // User-defined view
auto ima_roi = blend(clip(ima1, roi), clip(ima2, roi), 0.2); // ROI
auto ima_red = blend(red(ima1), red(ima2), 0.2); // Red channel

Figure 15: Chaining views to feed alpha-


← 0.2 × + 0.8 ×
blending.
ima ima1 ima2

Figure 13: Alpha-blending algorithm written


at image level. 4 Comparison with Data Flow
oriented frameworks
The final argument we bring in our discussion
about views is the fact that the IP practitioner A parallel can be drawn between image views and
raises his reasoning by one level. Indeed, let us take the data flow oriented programming [15] style used
a look at the alpha-blending algorithm as a support in Data Science such as the Apache Spark tech-
example for our argument. The default code for a nology [45, 20], Hadoop system [3] or even Ten-
classical, handmade (and error-prone C++) alpha- sorFlow [1]. Indeed, we find similar properties in
blending is presented in fig. 12. This algorithm those data flow system, such as composition and
makes several non-relevant hypotheses about the lazy-computing. Let us focus on the Apache Spark
image type. Indeed, it is not relevant to the final technology for this comparison. This technology is
application whether the image’s color is 8-bits RGB designed in two parts: first is the Spark program-
or float. Also, the practitioner may only need to ming model that creates a dependency graph; sec-
process a specific color channel, or a specific region ond is the runtime system which will schedule work
of the image. The image may also be 3D. To sum- unit on a cluster for the execution of the previously-
marize, there are a lot of hypotheses that are not built graph, and transports code and data to rele-
relevant to the application logic and yet weight on vant worker nodes.
the resulting implementations which lead us to the The spark programming model will proceed in
need of genericity. The solution is to shift the ab- three steps. First is the partitioning function used
stract level by one layer and reason at image level, with a homogeneous collection of objects to con-
as shown in fig. 13 which presents the code and struct the Resilient Distributed Dataset (RDD)
the produced view expression tree. Rewriting the from our data. Those transformations consist in
low level algorithm in terms of views is as simple a pipeline of higher-order functions (e.g. map, fil-
as in fig. 14. Finally, we also show in fig. 15 how ter, . . . ), which are chained with each other. Each
simple it now becomes to restrict input images to transformation returns a new RDD which depends
a specific region or specific color channel directly on the old RDD. Finally, an action (reduce) is per-
by chaining views at image level when reasoning at formed on the RDD. At that time all the transfor-

9
Background Grayscale Opening
mation pipeline is applied on the RDD and compu- (RGB-8) Conversion
Substract Thresholding
(Erosion+Dilation)

tation is scheduled on worker nodes. Furthermore,


the Spark programming model allows the developer Input Grayscale Output
(RGB-8)
Gaussian (Gray 8-bit)
Conversion
to fine tune how the program should handle inter-
mediate results (e.g. save it on storage for later
reuse).
Figure 16: Pipeline for foreground extraction
It is very similar to our view design in the fact using algorithms and views.
that transformations can be compared to views
(computed lazily and chainable) and actions can Background Candidate Result
be compared to our algorithms (perform the work
and resolve the transformation). However, it dif-
fers from views at execution time. Views will only
do computation on the part of the image that is
requested by the final algorithm whereas the data
flow pipeline may perform transformation on the
whole dataset prior to a narrowing transformation
Figure 17: Foreground extraction: sample re-
(filtering for instance). The dynamic model enable sult.
distribution on clusters of transformations asyn-
chronously when performing actions that acts as phological opening allow some robustness to noise.
barriers in the pipeline, but it does not prevent in- The pipeline is implemented with (1) OpenCV, (2)
efficient and unnecessary computation due to na- our library (Pylene) where each step is a computing
ture of the acyclic computation graph built on the operator, (3) our library where the purple blocks
successive transformed RDD. Indeed, RDD are im- are views. This pipeline actually produced interest-
mutable. In contrast, views are static, their compo- ing results, as shown in 17. In table 2, we bench-
sition is static and there is no need of frameworks mark the computation time and the memory us-
for that. Also, computation can be done in-place age 2 of these implementations (all single-threaded)
through projector views which is very memory effi- with an opening of disc of radius 32 on 10 MPix
cient. RGB images (the minimum of many runs is kept).
Finally, our design differs in the sens that views The results should not be misunderstood. They
are still image types (with an embedded operation). do not say that OpenCV is faster or slower but
When reasoning about images, the IP practitioner shows that implementations all have the same or-
can focus on behavior of his images and algorithms. der of processing time (the algorithms used in our
On the other hand, Data flow programmer focuses implementation are not the same as those used in
on the data and how to transform it in order to 2
Memory usage is computed with valgring/massif as
extract information. Design-wise, a RDD is a gen-
the difference between the memory peak of the run and
eralized super-type of data, more flexible due to its the memory peak without any computation (just setup
dynamic nature, but it does not abstract away the and image loading)
underlying complexity incurred by the processed
data. Framework Compute Time Memory ∆Memory
usage usage
Pylene (w/o views) 2.11s (± 144ms) 106 MB +0%
5 Experimentation OpenCV 2.41s (± 134ms) 59 MB -44%
Pylene (views) 2.13s (± 164ms) 51 MB -52%
To highlight the interest of GP and views in the
context of performance-sensitive applications, we Table 2: Benchmarks of the pipeline fig. 16 on
study the impact on a simple but real case im- a dataset (12 images) of 10MPix images. Av-
age processing pipeline aiming at extracting objects erage computation time and memory usage of
from a background as depicted on 16. Simply said,
implementations with/without views and with
it computes the difference between an image and a
registered image. The gaussian blur and the mor- OpenCV as a baseline.

10
float kThreshold = 150; float kVSigma = 10;
float kHSigma = 10; int kOpeningRadius = 32;
auto img_gray = view::transform(img_color, to_gray);
auto bg_gray = view::transform(bg_color, to_gray);
rithmic specialization based on runtime conditions
is not trivial. It requires ahead-of-time generation
auto bg_blurred = gaussian2d(bg_gray, kHSigma, kVSigma);
auto tmp_gray = img_gray - bg_blurred; /
of specializations that increases compile times and
auto thresholdf = [](auto x) { return x < kThreshold; };
auto tmp_bin = view::transform(tmp_gray, thresholdf); /
auto ero = erosion(tmp_bin, disc(kOpeningRadius)); does not scale with the parameter space size, or
dilation(ero, disc(kOpeningRadius), output); it requires switching to a more dynamic paradigm
that could degrade performances. Dealing with dy-
Figure 18: Pipeline implementation with
namic should not be an option when it comes down
views . Highlighted code uses views by pre- to exposing a static library to a dynamic language
fixing operators with the namespace view. like Python. As a future work, we will research
ways to address this issue.
OpenCV for blur and dilation/erosion) so that the
comparison makes sense. It allows us to validate ex-
perimentally the advantages of views in pipelines. References
First, we have to be cautious about the real ben-
efit in terms of processing time. Here, most of [1] Martín Abadi et al. TensorFlow: Large-scale
the time is spent in algorithms that are not eli- machine learning on heterogeneous systems,
gible for view transformation. Thus, depending on 2015. Software available from tensorflow.org.
the operations of the pipeline, views may not im-
prove processing time. Nevertheless, using views [2] B. Andres, U. Koethe, T. Kroeger, and
does not degrade performance neither (only 1% in F.A. Hamprecht. Runtime-flexible multi-
this experiment). It seems to show that using views dimensional arrays and views for C++98
does not introduce performance penalties and may and C++0x. arXiv preprint arXiv:1008.2909,
even be beneficial in lightweight pipelines as the IWR, Univ. of Heidelberg, Germany, 2010.
one in section 3. On the memory side, views re-
[3] Apache Software Foundation. Hadoop.
duce drastically the memory usage which is benefi-
cial when developing applications which are mem- [4] G. Berti. GrAL–the grid algorithms library.
ory constrained. From the developer standpoint, Future Generation Computer Systems, 22(1-
it requires only few changes in the code as shown 2):110–122, 2006.
in fig. 18 — the implementation of the algorithms
remain the same — which is a real advantage for [5] Boost. Boost c++ libraries.
software maintenance.
[6] L. Bourdev. Generic image library. http://
www.lubomir.org/pdfs/GIL_SDJ.pdf, 2020.
6 Conclusion [7] G. Bradski. The OpenCV library. Dr.
Thanks to simple yet concrete examples, we have Dobb’s Journal of Software Tools, 25:122–125,
shown how modern C++ and the generic program- November 2000.
ming paradigm can ease image processing software
[8] F. Brill and E. Albuz. NVIDIA VisionWorks
development. We have given a particular focus
toolkit. Presented at the 2014 GPU Technol-
to the concepts of image views and have shown
ogy Conference, 2014.
that they improve both performance and usabil-
ity of an image processing framework. These ideas [9] G. Brown, C. Di Bella, M. Haidl, T. Remmelg,
have been implemented in our C++20 library [13] R. Reyes, and M. Steuwer. Introducing paral-
and used for concrete image processing applications lelism to the ranges TS. In Proceedings of the
(medical imaging and document analysis). We have International Workshop on OpenCL, pages 1–
compared our design to existing similar design in 5, 2018.
data flow oriented programming and outlined the
main differences. Nonetheless, generic program- [10] Nicolas Burrus et al. A static C++ object-
ming in C++ comes with some downsides. Tem- oriented programming (SCOOP) paradigm
plates belong to the static world and selecting algo- mixing benefits of traditional OOP and generic

11
programming. In Proceedings of the Workshop [21] H. Homann and F. Laenen. SoAx: A generic
on Multiple Paradigm with Object-Oriented C++ structure of arrays for handling particles
Languages (MPOOL), Anaheim, CA, USA, in HPC codes. Computer Physics Communi-
October 2003. cations, 224:325–332, 2018.

[11] I. Burylov, M. Chuvelev, B. Greer, G. Henry, [22] J. Järvi, M.A. Marcus, and J.N. Smith. Li-
S. Kuznetsov, and B. Sabanin. Intel perfor- brary composition and adaptation using C++
mance libraries: Multi-core-ready software for concepts. In Proceedings of the 6th Inter-
numeric-intensive computation. Intel Technol- national Conference on Generative Program-
ogy Journal, 11(4), 2007. ming and Component Engineering, pages 73–
82, 2007.
[12] E. Carlinet et al. MToS: A tree of shapes for
multivariate images. IEEE Transactions on [23] E. Jones, T. Oliphant, P. Peterson, et al.
Image Processing, 24(12):5330–5342, 2015. SciPy: Open source scientific tools for Python,
2001–. http://www.scipy.org.
[13] E. Carlinet et al. Pylena: a modern C++ im-
age processing generic library, 2018. https: [24] Khronos Group. OpenVX. https://www.
//gitlab.lrde.epita.fr/olena/pylene. khronos.org/openvx/, 2019.

[14] D. Coeurjolly, J.-O. Lachaud, and B. Ker- [25] U. Köthe. STL-style generic programming
autret. DGtal: Digital geometry tools and al- with images. C++ Report Magazine, 12(1):24–
gorithms library, 2019. https://dgtal.org/. 30, 2000. https://ukoethe.github.io/
vigra.
[15] Jeffrey Dean and Sanjay Ghemawat. Mapre-
duce: Simplified data processing on large clus- [26] R. Levillain et al. Why and how to design a
ters. Commun. ACM, 51(1):107–113, January generic and efficient image processing frame-
2008. work: The case of the Milena library. In Pro-
ceedings of the IEEE Intl. Conf. on Image Pro-
[16] J.C. Dehnert and A. Stepanov. Fundamen- cessing (ICIP), pages 1941–1944, Hong Kong,
tals of generic programming. In Generic Pro- 2010.
gramming, volume 1766 of LNCS, pages 1–11.
Springer, 2000. [27] R. Levillain et al. Practical genericity: Writing
image processing algorithms both reusable and
[17] Thierry Géraud et al. Semantics-driven gener- efficient. In Proc. of the 19th Iberoamerican
icity: A sequel to the static C++ object- Congress on Pattern Recognition (CIARP),
oriented programming paradigm (SCOOP 2). volume 8827 of LNCS, pages 70–79. Springer,
In Proceedings of the 6th International Work- 2014.
shop on Multiparadigm Programming with
Object-Oriented Languages (MPOOL), Pa- [28] F. Meyer and J. Stawiaski. Morphology on
phos, Cyprus, July 2008. graphs and minimum spanning trees. In Proc.
of the Intl. Symp. on Mathematical Morphol-
[18] J. Y. Gil and R. Kimmel. Efficient dila- ogy (ISMM), volume 5720 of LNCS, pages
tion, erosion, opening, and closing algorithms. 161–170. Springer, 2009.
IEEE Transactions on Pattern Analysis and
Machine Intelligence, 24(12):1606–1617, 2002. [29] C. Misale, M. Drocco, G. Tremblay, and
other. PiCo: High-performance data analytics
[19] B. Grünbaum and G. C. Shephard. Tilings pipelines in modern C++. Future Generation
and Patterns. W. H. Freeman & Co., 1986. Computer Systems, 87:392–403, 2018.
[20] Dries Harnie et al. Scaling machine learning [30] David R. Musser and Alexander A. Stepanov.
for target prediction in drug discovery using Generic programming. In Intl. Symp. on Sym-
apache spark. Future Generation Computer bolic and Algebraic Computation, pages 13–25.
Systems, 67:409–417, 2017. Springer, 1988.

12
[31] E. Niebler and C. Carter. P1037R0: Deep in- [42] T. L. Veldhuizen. Blitz++: The library that
tegration of the ranges TS, May 2018. https: thinks it is a compiler. In Advances in Software
//wg21.link/p1037r0. Tools for Scientific Computing, volume 10 of
Lecture Notes on Computational Science and
[32] B. Perret, G. Chierchia, J. Cousty, S.J. F. Engineering, pages 57–87. Springer, 2000.
Guimarães, Y. Kenmochi, and L. Najman. Hi-
gra: Hierarchical graph analysis. SoftwareX, [43] M. Werner. GIS++: Modern C++ for effi-
10:100335, 2019. cient and parallel in-memory spatial comput-
ing. In Proc. of the ACM SIGSPATIAL Intl.
[33] G. X. Ritter, J. N. Wilson, and J. L David- Workshop on Geospatial Data Access and Pro-
son. Image algebra: An overview. Com- cessing APIs, pages 1–2, 2019.
puter Vision, Graphics, and Image Processing,
49(3):297–331, 1990. [44] Y. Xu, T. Géraud, and L. Najman. Con-
nected filtering on tree-based shape-spaces.
[34] M. Roynard, E. Carlinet, and T. Géraud. IEEE Transactions on Pattern Analysis and
An image processing library in modern C++: Machine Intelligence, 38(6):1126–1140, 2015.
Getting simplicity and efficiency with generic
programming. In Reproducible Research in [45] Matei Zaharia et al. Resilient distributed
Pattern Recognition—2nd Intl. Workshop, vol- datasets: A fault-tolerant abstraction for in-
ume 11455 of LNCS, pages 121–137. Springer, memory cluster computing. In 9th USENIX
2019. Symposium on Networked Systems Design and
Implementation (NSDI 12), pages 15–28, San
[35] R. Smith. N4849: Working draft, standard Jose, CA, April 2012. USENIX Association.
for programming language C++. Technical
report, January 2020. https://wg21.link/
n4849.

[36] Alexander Stepanov and Meng Lee. The stan-


dard template library, volume 1501. Hewlett
Packard Laboratories 1501 Page Mill Road,
Palo Alto, CA 94304, 1995.

[37] Alexander Stepanov and Paul McJones. Ele-


ments of Programming. Addison-Wesley Pro-
fessional, jun 2009.

[38] B. Stroustrup. Evolving a language in and for


the real world: C++ 1991-2006. In Proc. of
the 3rd ACM SIGPLAN Conf. on History of
Programming Languages, volume 4, pages 1–
59, New York, USA, 2007.

[39] D. Tschumperlé. The CImg library. On-


line report, June 2012. https://hal.
archives-ouvertes.fr/hal-00927458.

[40] S. van der Walt et al. Scikit-Image: Image


processing in Python. PeerJ, June 2014. DOI
10.7717/peerj.453.

[41] T. L. Veldhuizen. Expression templates. C++


Report, 7(5):26–31, 1995.

13

You might also like