3D SIMP Method in Matlab For Complimance Based Topology Optimization
3D SIMP Method in Matlab For Complimance Based Topology Optimization
https://doi.org/10.1007/s00158-020-02629-w
EDUCATIONAL PAPER
Received: 18 February 2020 / Revised: 2 May 2020 / Accepted: 10 May 2020 / Published online: 24 August 2020
© Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract
Compact and efficient Matlab implementations of compliance topology optimization (TO) for 2D and 3D continua are given,
consisting of 99 and 125 lines respectively. On discretizations ranging from 3 · 104 to 4.8 · 105 elements, the 2D version,
named top99neo, shows speedups from 2.55 to 5.5 times compared to the well-known top88 code of Andreassen et al.
(Struct Multidiscip Optim 43(1):1–16, 2011). The 3D version, named top3D125, is the most compact and efficient Matlab
implementation for 3D TO to date, showing a speedup of 1.9 times compared to the code of Amir et al. (Struct Multidiscip
Optim 49(5):815–829, 2014), on a discretization with 2.2 · 105 elements. For both codes, improvements are due to much
more efficient procedures for the assembly and implementation of filters and shortcuts in the design update step. The use of
an acceleration strategy, yielding major cuts in the overall computational time, is also discussed, stressing its easy integration
within the basic codes.
Coincidentally, the new 2D TO implementation consists of by n the global number of Degrees of Freedom (DOFs) in
99 lines of code and is thus named top99neo. We also the discretization and by d the number of (local) DOFs of
show how to include an acceleration technique recently each element.
investigated for TO by Li et al. (2020), with a few extra lines Let x = {xe }e=1:m ∈ [0, 1]m be partitioned between xA
of code and potentially carrying major speedups. Changes and xP , the sets of active (design) variables and passive ele-
needed for the extension to 3D problems are remarkably ments, respectively. The latter may be further split in the sets
small, making the corresponding code (top3D125) the of passive solid P1 (xe = 1) and void P0 (xe = 0) elements,
most compact and efficient Matlab implementation for 3D of cardinalities mP1 and mP0 , respectively (see Fig. 1a).
compliance TO to date. The set of physical variables x̂A = H(x̃) are defined by
Our primary goal is not to present innovative new the relaxed Heaviside projection (Wang et al. 2011)
research. Rather, we aim at sharing some shortcuts and tanh(βη) + tanh(β(x̃e − η))
speedups that we have noticed through time, to the benefit H(x̃e , η, β) = (1)
tanh(βη) + tanh(β(1 − η))
of the research community. Improvements introduced by the
present codes will be much useful also on more advanced with threshhold η and sharpness factor β, where x̃ = H x is
problems, such as buckling optimization, which will be the filtered field, obtained by the linear operator
dealt with in an upcoming work. i∈N he,i xi
The paper is organized as follows. In Section 2, we H (xe , rmin ) := e (2)
i∈Ne he,i
recall the setting of TO for minimum compliance. Section 3
where Ne = {i | dist(i , e ) ≤ rmin } and he,i =
is devoted to describe the overall structure of the 2D
max(0, rmin − dist(i , e )).
code, focusing on differences with respect to top88.
Given a load vector f ∈ Rn and the volume fraction
Sections 3.1–3.5 give insights about the main speedups and
f ∈ (0, 1), we consider the optimization problem
show performance improvements with respect to top88. ⎧
The very few changes needed for the 3D code are listed in ⎨ min m c x̂
xA ∈[0,1] A (3)
Section 4, where an example is presented and the efficiency ⎩s.t. V x̂ ≤ f | |
is compared to the previous code from Amir et al. (2014). h
Some final remarks are given in Section 5. Appendix A
for the minimization of compliance c x̂ = uT f with an
gives some details about the redesigns step that are useful upper bound on the overall volume
for better understanding a method proposed in Section 3.2
and the Matlab codes are listed in Appendices B and C. m
1
V x̂ = |e |x̂e = m P1 + x̂e ≤f (4)
m
e=1 e∈A
Problem (3) is solved with a nested iterative loop. At
2 Problem formulation and solution scheme
each iteration, the displacement u is computed by solving
the equilibrium problem
We consider a 2D/3D discretization h consisting of m
equi-sized quadrilateral elements e . Hereafter we denote Ku = f (5)
Fig. 1 Definition of the active A, passive solid P1 , and void P0 e = 1, is used by the assembly operation. The symmetric repetitions in
domains (a) and illustration of the connectivity matrix C for a simple I are highlighted, and their elimination gives the reduced set Ir (see
discretization (b). The set of indices I , here shown for the element Section 3.1)
A new generation 99 line Matlab code for compliance... 2213
where the stiffness matrix K = K(x̂) depends on the also allows the projection (1), with eta and beta as
physical variables through a SIMP interpolation (Bendsøe parameters. ftBC specifies the filter boundary conditions
and Sigmund 1999) of the Young modulus (’N’ for zero- Neumann or ’D’ for zero-Dirichlect), move
p is the move limit used in the OC update and maxit sets the
E(x̂e ) = Emin + x̂e (E0 − Emin ) (6)
maximum number of redesign steps.
with E0 and Emin the moduli of solid and void (Emin The routine is organized in a set of operations which
are performed only once and the loop for the TO iterative
E0 ). The gradients of compliance and structural volume redesign. The initializing operations are grouped as follows
with respect to x̂ read (χe = 1 if e ∈ A and 0 otherwise and
1m is the identity vector of dimension m) PRE.1) MATERIAL AND CONTINUATION PARAMETERS
whereas passive domains may be specified targeting a set The stiffness interpolation and its derivative (sK, dsK)
of column and rows from the array elNrs. Independently are defined, and the stiffness matrix is assembled (see
of the particular example, Lines 34–36 define the vector of Lines 73–76). Ideally, one could also get rid of Lines 73–
applied loads, the set of free DOFs, and the sets of active 74 and directly define sK in Line 75 and dsK within
A ↔ act design variables. Line 79. However, we decide to keep these operations
In order to make the code more compact and read- apart, enhancing the readability of the code and to ease the
able, operations which are repeatedly performed within specification of different interpolation schemes. Equation
the TO optimization loop are defined through inline func- (5) is solved on Line 77 using the Matlab function
tions in PRE.4) (Lines 38–43). The filter operator is built decomposition, which can work with only half of
in PRE.5) making use of the built-in Matlab function the stiffness matrix (see Section 3.1). The sensitivity of
imfilter, which represents a much more efficient alter- compliance is computed, and the backfiltering operations
native to the explicit construction of the neighboring array. (8) are performed in RL.3).
A similar approach was already outlined by Andreassen The update (10), with the nested application of the
et al. (2011), pointing to the Matlab function conv2, which bisection process for finding λ̃k , is implemented in
√ RL.4)
is however not completely equivalent to the original oper- (Lines 86–91), and we remark that lm represents λ.
ator, as it only allows zero-Dirichlet boundary conditions Some information about the process is printed and the
for the convolution operator. Here, we choose imfilter, current design is plotted in RL.5) (Lines 94–97). On
which is essentially as efficient as conv2, but gives the small discretizations, repeated plotting operations absorb a
flexibility to specify zero-Dirichlet (default option), or zero- significant fraction of the CPU time (e.g., 15% for m =
Neumann boundary conditions. 4800). Therefore, one might just plot the final design,
Some final initializations and allocations are performed moving Lines 96–97 outside the redesign loop.
in PRE.6). The design variables are initialized with The tests in the following have been run on a laptop
the modified volume fraction, accounting for the passive equipped with an Intel(R) Core(TM) [email protected]
domains (Line 52–53) and the constant volume sensitivity CPU, 15 GB of RAM, and Matlab 2018b running in serial
(7) is computed in Line 51. mode under Ubuntu 18.04 (but a similar performance is
Within the redesign loop, the following five blocks of expected in Windows setups). We will often refer to the
operations are repeatedly performed
half MBB beam example (see Fig. 2) for numerical testing.
Unless stated otherwise, we choose h = 300 × 100,
RL.1) COMPUTE PHYSICAL DENSITY FIELD f = 0.5, and rmin = 8.75 (Sigmund 2007). The load, having
RL.2) SETUP AND SOLVE EQUILIBRIUM EQUATIONS total magnitude |q| = 1 is applied to the first node. No
RL.3) COMPUTE SENSITIVITIES passive domains are introduced for this example; therefore,
RL.4) UPDATE DESIGN VARIABLES AND APPLY CONTINUATION pasS=[];, pasV=[]; and we set E1 = 1, E0 = 10−9 ,
RL.5) PRINT CURRENT RESULTS AND PLOT DESIGN and ν = 0.3 in all the tests.
Fig. 2 Geometrical setting for the MBB example the number of elements m, especially for 3D discretizations
A new generation 99 line Matlab code for compliance... 2215
Table 1 Number of entries in the array I and corresponding memory requirement for the 2D and 3D test discretizations. White background refers
to the F strategy with coefficients specified as double, cyan background to the H strategy, and light green to the H strategy and element specified
as int32. The H strategy cuts |I | and memory of ≈ 44% in 2D and ≈ 48% in 3D. Then, specifying the indexes as int32 further cuts memory
of another 50%
(see Table 1), and even though its elements are integers, and the overall indexing array becomes Ir = [iK, jK] ∈
d
the sparse function requires them to be specified as Nd̃∗m×2 where d̃ = j =1 i≤j i. The entries of the
double precision numbers. The corresponding memory indexing array and the memory usage are reduced by
burden slows down the assembly process and restricts the approx. 45% (see Table 1).
size of problems workable on a laptop. The set of indices (14) can be constructed by the
The efficiency of the assembly can be substantially following instructions (see Lines 15–21)
improved by
1. Acknowledging the symmetry of both Ke and K
2. Using an assembly routine working with iK and jK
specified as integers
To understand how to take advantage of the symmetry of
matrices, we refer to Fig. 1b and to the connectivity matrix
C. Each coefficient Cej ∈ N addresses the global DOF which can be adapted to any isoparametric 2D/3D element
targeted by the j th local DOF of element e. Therefore, (12) just by changing accordingly the number d of elemental
explicitly reads DOFs. In the attached scripts, based on 4-noded bilinear
Q4 and 8-noded trilinear H 8 elements, we set d=8 and
iKe = {ce , ce , . . . , ce } d=24, respectively. The last instruction sorts the indices
d times as iKr(i) > jKr(i), such that K (s) contains only
(13)
jK = {ce1 , . . . , ce1 , ce2 , . . . , ce2 , . . . , ced , . . . , ced }
e sub-diagonal terms.
d times d times d times
The syntax K=sparse(iK,jK,sK) now returns the
lower triangular matrix K (s) and we remark that the full
where ce = {ce1 , ce2 , . . . , ced } is the row corresponding to operator can be recovered by
element e.
If we only consider the coefficients of the (lower)
K = K (s) + (K (s) )T − diag[K (s) ] (15)
symmetric part of the elemental matrix Ke(s) and their
locations into the global one K (s) , the set of indices can be
which costs as much as the averaging operation 12 (K +K T ),
reduced to
performed in top88 to get rid of roundoff errors. However,
iKe = {ce1 , . . . , ced , ce2 , . . . , ced , . . . , ce3 , . . . , ced , . . . , ced } the Matlab built-in Cholesky solver and the corresponding
jKe = {ce1 , . . . , ce1 , ce2 , . . . , ce2 , ce3 , . . . , ce3 , . . . , ced }
(14) decomposition routine can use just K (s) , if called with
d times (d−1) times (d−2) times the option ’lower’.
2216 F. Ferrari and O. Sigmund
More details on the derivation of (19) are given in The update rule (20) is usually applied only once each q
Appendix A. The behavior of the estimate (19) is shown steps. Thus, we can write more generally xk+1 = xk + zk ,
in Fig. 4b for the MBB example. The overall number where (Pratapa et al. 2016)
of bisections (nbs ) in order to compute λ∗k meeting the
αrk q ∈
if k+1 /N
tolerance τ = 10−8 when considering k = [0, λ∗k ]
(0)
zk = (23)
is cut by about 50%, compared with the one required by ζ I − (Xk + ζ Fk )γ k if q ∈ N
k+1
starting from (0) = [0, 109 ] as in top88. Moreover, if (α ∈ (0, 1)) obtaining the so-called periodic Anderson
no projection is applied, (19) could be used together with extrapolation (PAE) (Pratapa et al. 2016; Li et al. 2020).
(10) to perform an explicit Primal-Dual iteration to compute The implementation can be obtained, e.g., by adding the
(xk+1 , λ∗k ) and this would reduce the number of steps even following few lines after the OC step (Line 91)
more (see green curve in Fig. 4b).
However, in the basic versions of the codes, given in
Appendices B and C, we consider the bisection process and
(19) is used to bracket the search interval, as this procedure
is more general.
Table 2 Comparison of convergence-related parameters for the From Fig. 5 it is easy to notice the trend of PAE of
standard (T) and accelerated (T-PAE) TO tests, for the MBB example producing a design with some more bars. This may even
√ give slightly stiffer structures, such as for case T3, where
it. c c r2 / m mN D
the non accelerated approach removes some bars after it =
T1 2500 252.7 4.2 · 10−8 1.03 · 10−5 0.025 2000, whereas stopping at the design of T3–PAE gives a
T1-PAE 828 258.9 4.2 · 10−10 9.95 · 10−7 0.021 stiffer structure.
T2 2500 246.1 5.1 · 10−8 3.21 · 10−5 0.023 A comment is about the convergence criterion used,
T2-PAE 352 253.9 6.2 · 10−9 9.97 · 10−7 0.014 which is different from the one in top88 (maximum
T3 2500 199.6 1.1 · 10−4 1.91 · 10−3 0.014 absolute change of the design variables (xk+1 − xk ∞ ).
T3-PAE 752 197.5 3.7 · 10−8 8.72 · 10−7 0.007 Here, we consider it more appropriate to check the residual
T4 2500 191.8 2.0 · 10−7 3.21 · 10−5 0.006 with respect to the physical design field, and the 2-norm
T4-PAE 818 192.1 2.5 · 10−7 9.97 · 10−7 0.001 seems to give a more global measure, less affected by local
oscillations.
1996). However, a deeper discussion about the influence of
all parameters on the convergence is outside the scope of the 3.4 Performance comparison to top88
present work and we refer to Li et al. (2020) or, in a more
general context, to Walker and Ni (2011) for this . We compare the performance of top99neo to the previous
Results are collected in Table 2 and Figs. 5 and 6, top88 code. In the following, we will refer to “top88”
showing the evolution of the norm of the residual, the as the original code provided by Andreassen et al. (2011)
flatness of the normalized compliance ck /c0 = (ck − and to “top88U” as its updated version making use of the
ck−1 )/c0 and the non-discreteness measure mND = 100 · sparse2 function (Davis 2009) for the assembly, with iK
4xT (1 − x)/m. We observe how Anderson acceleration and jK specified as integers, and the filter implemented by
substantially reduces the number of iterations needed to using conv2.
fulfill the stopping criterion, at the price of just a moderate The codes are tested by running 100 iterations for the
increase in compliance (0.2–3%). Moreover, starting the MBB beam example (see Fig. 2), for the discretizations
acceleration just a few iterations later (e.g., it = 50 or 300 × 100, 600 × 200, and 1200 × 400, a volume fraction
it = 100 for T1) gives much lower compliance values f = 0.5 and considering mesh independent filters of radii
(c = 254.3 and c = 252.9, respectively) and for T3 and T4 rmin = 4, 8, and 16, respectively. For top88 and top88U,
when the acceleration is started as the design has stabilized, we only consider density filtering, whereas for the new
compliance differences are negligible. top99neo, we also consider the Heaviside projection,
with the η∗ computed as described in Section 3.2. It will be
apparent that the cost of this last operation is negligible.
Timings are collected in Table 3 where tit is the average
cost per iteration, tA and tS are the overall time spent
by the assembly and solver, respectively, and tU is the
overall time spent for updating the design variables. For
top88 and top88U, the latter consists of the OC updating
and the filtering operations performed when applying the
bisection on the volume constraint. For top99neo, this
term accounts for the cost of the OC updating, that for
estimating the Lagrange multiplier λ∗ as discussed in
Section 3.2 and the filter and projection (Lines 59–70). tP
collects all the preliminary operations, such as the set up of
the discretization, and filter, repeated only once, before the
TO loop starts.
From tit , we clearly see that top99neo enhances the
performance of the original top88 by 2.66, 3.85, and 5.5
times on the three discretizations, respectively. Furthermore,
timings of top88 on the largest discretization (1200×400),
relate to a smaller filter size (rmin = 12), because of memory
issues; thus, the speedup is even underestimated in this
Fig. 5 Optimized designs obtained without (left column) and with case. Comparing to top88U version, the improvements are
Anderson acceleration (right column) of the TO loop less pronounced (i.e., 1.55, 1.57, and 1.78 times) but still
2220 F. Ferrari and O. Sigmund
Fig. 6 Evolution of some parameters related to convergence for the second row shows a measure of the flatness of the objective function
standard and Anderson accelerated TO process. The first row shows and the last row shows the non-discreteness measure
the normalized norm of the residual defined on physical variables, the
substantial. The computational cost of the new assembly Computational savings would become even higher when
strategy is very low, even comparing to the top88U adopting the larger filter size rmin = 8.75 for the mesh
version, and its weight on the overall computational cost is 300 × 100, and scaling to rmin = 17.5 and rmin = 35
basically constant. Also, from Table 3, it is clear that the on the two finer discretizations. For these cases, speedups
design variables update weighs a lot on the overall CPU with respect to top88 amount to 4.45 and 10.35 on the
time, for both top88 and top88U. On the contrary, this first two meshes, whereas for the larger one, the setup of
becomes very cheap in the new top99neo thanks to the the filter in top88 causes a memory overflow. Speedups
strategies discussed in Section 3.2; tU takes about 4–5% of with respect to top88U amount to 1.55, 2.55 and 3.6 times
the overall CPU time. respectively.
Table 3 Comparison of numerical performance between the old top88/top88U and new top99neo Matlab code. tit is the cost per iteration,
tA , tS , tU are the overall times for assembly, equilibrium equation solve, and design update, respectively. tP is the time spent for all the preliminary
operations. Values within brackets represent the % weight of the corresponding operation on the overall CPU. On the larger mesh, top88 is run
with rmin = 12, because of memory issues
tit 0.615 0.358 0.231 4.57 1.87 1.19 31.3 10.1 5.69
tA 19.4(31.5) 5.4(15.0) 1.4 (6.1) 83.1(18.2) 31.3(16.7) 5.6 (4.7) 361.1(11.6) 151.5(15.2) 30.7 (5.4)
tS 23.1(37.4) 22.9(59.3) 19.7(85.3) 122.4(26.8) 109.3(58.4) 106.9(89.7) 592.5(19.0) 513.2(50.9) 510.5(89.6)
tU 13.3(21.6) 4.8(13.5) 1.2 (4.8) 223.8(48.8) 38.0(20.3) 5.2 (4.4) 1164.2(37.4) 310.4(31.4) 29.2 (5.1)
tP 0.8(1.3) 0.06 (0.2) 0.1 (0.3) 12.9 (2.8) 0.1(< 0.1) 0.2(< 0.1) 92.3 (3.1) 0.5(< 0.1) 0.6(< 0.1)
A new generation 99 line Matlab code for compliance... 2221
3.5 Frame reinforcement problem where lDofv and lDofh target the DOFs subjected to
vertical and horizontal forces, respectively. Then, the load
Let us go back to the example of Fig. 1a, adding the specifi- (Line 34) is replaced with
cation of passive domains and a different loading condition.
We may think of a practical application like a reinforcement
problem for the solid frame, with thickness t =L/50 (P1 ),
subjected to two simultaneous loads. A vertical, uniformly
distributed load with density q = −2 and a horizontal height- Figure 7 shows the two optimized design corresponding
proportional load, with density b = ±y/L. Some structural to the two orientations of the horizontal load b, after
material has to be optimally placed within the active design 100 redesign steps. The routine top99neo has been
domain A in order to minimize the compliance, while keeping called with the following arguments nely=nelx=900,
the void space (P0 ), which may represent a service opening. volfrac=0.2, penal=3, rmin=8, ft=3, eta=0.5,
To describe this configuration, we only need to replace beta=2 and no continuation is applied. The cost per
Lines 31–33 with the following iteration is about 10.8 s and, considering the fairly large
discretization of 1.62 · 106 DOFs, is very reasonable.
4 Extension to 3D
Fig. 8 Geometrical sketch of the 3D cantilever example (a) and opti- mesh h = 96 × 48 × 48 and has been obtained by replacing the direct
mized topology for h = 48 × 24 × 24 and considering the two filter solver with the multigrid–preconditioned CG (see Amir et al. 2014 for
boundary conditions (b, c). The design in d corresponds to the finer details)
2222 F. Ferrari and O. Sigmund
(s)
Notable modifications are the definition of Ke for the state equation solve) whereas in top3D125 this weight is
8-node hexahedron (Lines 24–47) and the solution of the cut to 7 − 10%. Also, the time spent for the OC update
equilibrium (5), now performed by is reduced, even though the code of Amir et al. (2014)
already implemented a strategy for avoiding filtering at each
bisection step.
Table 4 Performance comparison between the new top3D125 code and the one from Amir et al. (2014). tit , tA , tS , tU , and tP have the same
meaning as in Table 3 and numbers between brackets denote the % weight of the operations on the overall CPU time
√ √
h 48 × 24 × 24, rmin = 3 96 × 48 × 48, rmin = 2 3
can also be extended to other problems, to some extent, due to separability of the approximation. Let us denote
and Anderson acceleration is also usable in a more general the rightmost expression xe = F(j )e (λ), and taking into
setting (e.g., within MMA). account the box constraints in C , we have
⎧
Therefore, we believe that this contribution should be help- ⎪
⎨x(j +1),e = δ−
⎪ if e ∈ L = {e | x(j +1),e ≤ δ− }
ful to all researchers and practitioners who aim at tackling U (xe ) = x(j +1),e = δ+ if e ∈ U = {e | x(j +1),e ≥ δ+ } (26)
⎪
⎪
TO problems on laptops, and set a solid framework for the ⎩x
(j +1),e = F(j ),e if e ∈ M = {e | δ− < x(j +1),e < δ+ }
efficient implementation of more advanced procedures.
where C = L + U + M. The above is equivalent to (10).
Acknowledgments The project is supported by the Villum Fonden 2. We then evaluate the dual function for x(j +1) given by
through the Villum Investigator Project “InnoTop.” The authors are (26), and the stationarity (∂λ ψ = 0) gives
grateful to members of the TopOpt group for their useful testing of the
code.
m
∂e V (ξ )(χU δ+ +χL δ− + F(j ),e (λ)χM )−f |h | = 0
Compliance with ethical standards e=1
Appendix A: Elaboration on the OC update where | · | denotes the number of elements in a set.
Equations (26) and (27) can be iteratively used to
Let us consider (3) at a given design point xk assuming the
compute the new solution (xk+1 , λ∗k ), as implemented in the
reciprocal and linear approximation for the compliance and √
code here below (again, note that lm here represents λ)
volume functions, respectively (Christensen and Klarbring
2008)
m −1
min c (x) ck + 2
e=1 (−xk,e ∂e c(xk ))xe
x∈[δ− ,δ+ ]m
m (24)
s.t. e=1 ∂e V (xk )xe − f |h | ≤ 0
m
L(x, λ) = c(x) + λ ∂e V (xk )xe − f |h |
e=1