0% found this document useful (0 votes)
64 views

Conversion Gate01

The document summarizes the use of optimization frameworks and search methodologies for analyzing and redesigning the Escherichia coli metabolic network. It discusses using flux balance analysis and minimization of metabolic adjustment to model cell metabolism and identify gene knockouts that improve production of desired compounds. Genetic algorithms are proposed to more efficiently search the large solution space as standard search methods are limited by computational time. Parameters for the genetic algorithm like representation, crossover, mutation rate, and population size are described.

Uploaded by

RINKU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Conversion Gate01

The document summarizes the use of optimization frameworks and search methodologies for analyzing and redesigning the Escherichia coli metabolic network. It discusses using flux balance analysis and minimization of metabolic adjustment to model cell metabolism and identify gene knockouts that improve production of desired compounds. Genetic algorithms are proposed to more efficiently search the large solution space as standard search methods are limited by computational time. Parameters for the genetic algorithm like representation, crossover, mutation rate, and population size are described.

Uploaded by

RINKU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Optimization Based Frameworks

and Search Methodologies for the


Analysis and Redesign of the
Escherichia coli Metabolic Network.

Thesis defense by: William W. Gikandi


Major Professor: Matheos Koffas
Additional committee Members:
Prof. E. (Manolis) S. Tzanakakis
Prof. Sriram Neelamegham
Cell Modeling to Improve Naringenin
Production in E. coli
Cell Modeling
 Variety of methods.
 Identify the steady state fluxes of a cell.
 Main ones Flux Balance Analysis and MOMA
Flux Balance Analysis
Procedure
Steady State Assumption
Is it biologically justifiable to assume it?

“The steady state approximation is generally valid because of fast


equilibration of metabolite concentrations (seconds) with respect to the
time scale of genetic regulation (minutes)” – Segre 2002
Maximization Objective
Cell’s objective is to Maximize Biomass

The Maximization objective = the stoichiometric sum of components that


constitute Biomass
Minimization of Metabolic
Adjustment (MOMA)
 Do mutant bacteria exhibit optimum metabolic
states?
 Not subjected to the same evolutionary
pressure that shaped the wild type
 Therefore knockouts probably do not possess
a mechanism for immediate regulation of
fluxes toward the optimal growth configuration
MOMA
 Hypothesis: knocked out bacteria initially
display a suboptimal flux distribution with
minimal cell-wide changes in fluxes

MOMA uses quadratic


programming to approximate this
behavior
FBA and MOMA
 MOMA calculates initial flux distribution after
perturbation assuming sub-optimal growth.
 FBA (incorrectly) assumes perturbed cells
behave optimally from the onset.
 Regulatory/ Kinetic effects not accounted for.

constraints FBA/ MOMA fluxes


Does Cell Modeling Work?
 Qualitatively predict the growth potential of mutant
strains
 Qualitatively predict media dependent uptake/
secretion of protons in the growth
 The average difference between experimental flux
measurements and ones predicted by the model was
16%
 Quantitatively describe relationship between uptake
of a primary carbon source (acetate, malate,
succinate), oxygen and maximal cellular growth rate.
 Successfully identify triple-knockout gene targets that
improved lycopene yield by ~ 40% in E. coli
FBA/ MOMA
Building the Model
[c]akg + ala-L <==> glu-L + pyr

[c]ala-L <==> ala-D

[c]asn-L + h2o --> asp-L + nh4

[c]asp-L + atp + nh4 --> amp + asn-L + h + ppi

[c]asp-L + atp + gln-L + h2o --> amp + asn-L + glu-L + h + ppi Matrix Creator

[c]asp-L --> fum + nh4

[c]akg + asp-L <==> glu-L + oaa

[c]3mob + ala-L --> pyr + val-L

[c]ala-D + fad + h2o --> fadh2 + nh4 + pyr


Current Model
 1191 Total Fluxes
 932 Reactions
 259 Transport & Exchange Fluxes
 70 Dead end Metabolites

Glycolysis, the TCA cycle, the pentose phosphate pathway, respiration,


anaplerotic reactions, fermentative reactions, amino acid biosynthesis
and degradation, nucleotide biosynthesis and interconversions, fatty
acid biosynthesis and degradation, phospholipid biosynthesis, cofactor
biosynthesis, and metabolite transport
Testing the Model
Obtained in-Silico exchange fluxes vs. Palsson's iJR904 model
50
Matlab

Palsson
40

30
Output (mmol/g DW-hr)

20

10

0
EX_co2(e) EX_h(e) EX_h2o(e) EX_pi(e) Biomass

-10
EX_nh4(e)

Similar results for Anaerobic-Glucose, Aerobic-Succinate, Aerobic-Acetate substrates


-20
Exchange flux
Proton Exchange Flux
Proton Exchange Flux
Limiting exchange of protons across system boundary

1.20E+00

1.00E+00

8.00E-01
Relative Growth Rate

Acetate

Akg
6.00E-01
Glucose-D

L-lactate

D-Lactate 4.00E-01

Malate

Pyruvate
2.00E-01
Succinate

Glycerol

0.00E+00
-10 -8 -6 -4 -2 0 2 4 6
Proton secretion flux
Naringenin
 Reactions added

Participating Enzyme Reaction


Coumaric Acid transport cma[e] <==> cma[c]
4 coumarate:coenzyme A ligase [c]atp + cma + coa --> amp + ppi + cmcoa
Chalcone Synthase [c](3) malcoa + cmcoa --> (4) coa + chal + (3) co2
Chalcone Isomerase [c]chal --> flva
Naringenin exchange flux [e]flva <==>
Coumaric Acid exchange flux [e]cma <==>
Naringenin transport flva[e] <==> flva[c]
Evaluate Scenarios
Gene-Protein Relationships
Gene-Protein Relationships
Gene-Protein Relationships
Gene Map
Overall Process
Standard Search

At 2 seconds/ calculation…
Primary Knockouts < 3 hours
Secondary Knockouts ~ 1 day
Tertiary Knockouts ~ 12 days
Quaternary Knockouts ~ 230 days

Combinatorial Explosion
Limited search space
Problem of large search space
 Time taken
 Not all search covered
 Other methods possible?
 Genetic Algorithm
Genetic Algorithm
Genetic Algorithm
Crossover - Recombination
Crossover combines genetic material from two parents,
in order to produce superior offspring.
Mutation
•Mutation introduces randomness into the population.
•The idea of mutation is to reintroduce divergence into a
converging population.
Fitness Function
 The Fitness function determines what
solutions are better than others.
 Fitness is computed for each individual.
 Fitness = flavanoid production
Example population

No. Chromosome Fitness


1 1010011010 1
2 1111100001 2
3 1011001100 3
4 1010000000 1
5 0000010000 3
6 1001011111 5
7 0101010101 1
8 1011100111 2
Selection
Main idea: better individuals get higher chance
 Chances proportional to fitness
 Roulette wheel technique
1/6 = 17%

A B
C
3/6 = 50% 2/6 = 33%

fitness(A) = 3
fitness(B) = 1
fitness(C) = 2
Stopping Criteria
 Final problem is to decide
when to stop execution of algorithm.
 There are two possible solutions
to this problem:
 First approach:
 Stop after production
of definite number of generations
 Second approach:
 Stop when the improvement in average fitness

over two generations is below a threshold


Typical behavior of an EA
Phases in optimizing on a 1-dimensional fitness landscape

Early phase:
quasi-random population distribution

Mid-phase:
population arranged around/on hills

Late phase:
population concentrated on high hills
Advantages of GA’s
 Search space not limited to first top 10
knockouts
 Supports multi-objective optimization
 Can return a family of solutions with
similar fluxes
 Easy to exploit previous or alternate
solutions
 May find synergistic knockouts overlooked
by standard search
Genetic Algorithm
Parameters of the GA
 Representation scheme: Integer
[00100111]
[3 6 7 8]
 Mutation rate: 1/ string length / locus
restricted
 Crossover type: scattered (random mix)
 Elite children : 2
 Stall generations: 50
 Population size: 1000
 Mutation probability: Simulated Annealing
Simulated Annealing
Simulated Annealing
Change in Mutation Rate

0.6

0.5

0.4
Mutation rate

0.3

0.2

0.1

0
0 10 20 30 40 50 60 70 80 90 100
Generation %
Results:
Results: Summary
 Over 10,000 KO results were stored by the
algorithms, out of about 900,000 MOMA
calculations performed
Results: Hill Climber VS GA
Which is better?
 Results for both methods in Agreement
 Exhaustive combination of top 10 most
frequently suggested KO’s yielded no better
results
 Implications: the search space is not as
chaotic as originally assumed
Results: Effect of Gene Mapping
 More accurate prediction on reactions
affected by disruption of genes
 For example, the top yielding candidate for a
primary level knockout predicted the loss of
two reactions
Results: Primary Level
 The top result predicted a flux increase of
naringenin from zero with no knockouts
performed to 0.6078 mmol/g-DW/hr
 Gene: sdhC
 Reaction:

 Reaction reduces amount of fumerate


available to the cell. (Other sources available:
e.g. glutamate degradation)
Results: Primary Level
Affects ATP availability?
Results: Primary Level
Affects ATP availability?

 The top second result


 Gene: tpiA
 Glycolysis
Naringenin flux (mmol/g-DW/hr)
Pr
im
ar

0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
y(
'sd
h C'
Pr
im )
ar
y(
'tp
iA
Pr ')
Se im
c on ar
y(
da 'g
ry nd
Se ('g ')
co nd
nd '
's
ar d hC
y(
'g ')
Se ly
c on
A'
's
da d
Te r y(
hC
rti 'f o ')
ar lD
y(
'g '
dh 's
dh
Te A' C
rti 'g ')
ar nd
y( '
'g 's
cd dh
Q
Te ' C
r tia 'g ')
ua ly
te r y A
rn ('m '
ar 's
dh dh
Q
y(
'd ' C
ua cu 'g ')
C ly
te
rn ' A'
ar 'b 'sd
rn

KO Genes
y( Q hC
'd ' ')
Q cu 'g
ua C nd
te ' '
G rn 'b 'sd
A rn
a
Top 3 Simulated KO in each level

Q r Q hC
y( ' ')
ua 'g 'fo
te dh lD
rn A '
G ar ' 'sd
A y( 'p
Q ' g hC
g i' ')
ua nd 'b
te ' rn
rn 'd Q
ar cu '
G y( C 'g
A 's ' nd
Q dh 'b ')
ua C r nQ
te ' '
rn 'g 'sd
ar dh
y( A hD
'g ' ')
nd 'a
' ce
'm A'
dh 'g
Results: Top 3 in each level

' nd
'g ')
dh
A'
's
dh
B
')
% increase of naringenin flux
Pr
im
ar

100000
200000
300000
400000
500000
600000
700000
800000

0
y(
'sd
hC
Pr ')
im
ar
y(
'tp
iA
Pr ')
Se im
co ar
nd y(
ar 'g
Se y ('g
nd
')
co nd
nd '
ar 's
dh
y(
' C
Se gl ')
yA
co '
nd 's
ar dh
Te y( C
rti 'f o ')
ar lD
y(
'g '
dh 's
dh
Te A' C
'g ')
rti
ar n d
y( '
'g 's
cd dh
Q
Te ' C
rti 'g ')
ua
te a ry
ly
A
rn ('m '
's
ar
y d h
dh
Q ( 'd ' C
ua 'g ')
te
cu
C l y
rn ' A '
ar 'b 'sd
y( rn
'd Q hC
c ' ')
Q uC 'gn
ua ' d '
G te 'b
A rn rn 'sd
Q ar Q hC
ua y ( 'g ' ')
'fo
te dh lD
rn A '
G ar ' 'sd
A y( 'p
Q ' g
gi
'
hC
ua nd 'b ')
te ' rn
rn 'd Q
ar cu '
G y( C 'g
A 'sd ' nd
Q h 'b ')
ua C rn
te ' Q
rn 'g '
ar dh 'sd
% increase over predicted naringenin wildtype flux (0.0002 mmol/g-DW/hr)

y( A hD
'g ' ')
'a
nd
' c eA
'm '
dh 'g
' nd
'g ')
Results: Increase over Wild type

dh
A'
's
dh
B
')
Results: Targets
 TCA cycle, the pentose phosphate pathway,
and other biosynthetic pathways
Results: Rationalization
Precursor Availability
5

4.5 Malonyl CoA ACP


transacylase: only
4
consumer of
3.5
malonyl CoA
Flux outputs (mmol/g-DW/hr)

Malonyl CoA ACP transacylase


2.5
acetyl CoA carboxylate

Acetyl CoA
1.5
carboxylate:
1
produces malonyl
CoA
0.5

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
Naringenin Flux (mmol/g-DW/hr)
Naringenin/ Biomass Relationship
Competition for precursors

0.9

0.8

0.7

0.6
Biomass flux (mmol/g-DW/hr)

0.5

0.4

0.3

0.2

0.1

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
Naringenin output (mmol/g-DW/hr)
% increase of naringenin flux
Pr
im

0
50
100
150
200
250
300
ar
y(
'sd
hC
')
Pr
im
ar
y(
'tp
iA
')
Pr
im
Se ar
y(
co 'g
nd nd
ar ')
y(
'g
Se nd
'
co 's
nd dh
ar C
y ')
('
gl
yA
Se '
co 's
dh
nd C
a ry ')
Te ('f
ol
r ti D
ar '
y( 's
'g dh
dh C
A' ')
Te 'gn
r ti d'
ar 's
y( dh
'g C
cd ')

KO Genes
'
'g
Te ly
A
Q
r ti
ar '
ua y 's
('m dh
te d C
rn h ')
ar '
y( 'gl
'd yA
Q cu '
C
% increases of top 3 KO's over previous levels

ua ' 'sd
te
rn 'br hC
ar nQ ')
y( '
'd 'gn
cu d'
C
Q ' 'sd
ua 'b hC
te rn
Q ')
rn
ar '
'fo
y( lD
'g '
dh 'sd
A
' hC
'p ')
gi
'
'b
rn
Q
'
'g
Results: Diminishing Returns

nd
')
Biomass Flux (mmol/g-DW/hr)
W

0
1

0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
i ld
ty
Pr p e
im
ar
y(
'sd
h C'
Pr
im )
ar
y(
'tp
iA
Pr ')
Se im
co ar
nd y(
ar 'g
y nd
Se ('g ')
co nd
nd '
ar 's
Biomass Threshold

dh
y(
' C
Se gl ')
co y A'
nd 's
ar dh
Te y C
r ti
ar
('f
ol ')
y( D
'g '
dh 's
dh
Te A' C
r ti 'g ')
ar nd
y(
'g '
c 's
Q Te d '
dh
C
ua r ti 'g ')
te ar l yA
rn y(
'm '
ar 's
y( dh dh
Q 'd ' C
ua cu 'g ')
te C ly

KO Genes
rn ' A '
ar 'b 'sd
y( rn
'd Q hC
Q ' ')
ua
cu
C 'g
' nd
G te 'b '
A rn rn 'sd
Q ar
y Q hC
'
ua ('g 'f ')
Biomass Flux of top 3 KO's in each level

te dh o lD
G
rn A '
A
ar ' 'sd
y( 'p
Q ' g g i hC
ua ' ')
te
nd
' 'b rn
rn ' dc Q
G ar uC '
A y(
' 'g
Q sd ' nd
ua hC ' b ')
rn
te ' Q
rn 'g '
ar dh 'sd
y( A
'g ' hD
nd 'a ')
' ce
'm A'
dh 'g
Results: Diminishing Returns

' nd
'g ')
dh
A'
's
dh
B
')
In Conclusion
 Will all knockouts identified show increased
productivity?
 In-vivo results could provide an opportunity to
improve the model.
 The approaches used justify some optimism
regarding gene targeting for strain
improvement
 Provide a clearer understanding of the nature
of the optimization goal
Questions?

You might also like