0% found this document useful (0 votes)
103 views4 pages

Fuzzy Control for Inverted Pendulum

This document summarizes a study that compares different fuzzy logic control approaches for stabilizing an inverted pendulum system. Specifically, it compares a single-layer fuzzy rule set versus a two-layer rule set, and examines how different objective functions in an evolutionary algorithm affect the learned rule sets. An evolutionary algorithm is used to learn the rule sets by evaluating candidate solutions on their ability to control the nonlinear pendulum system dynamics. Results show that the objective function parameters and initial conditions can significantly impact the learned rule sets and controller performance.

Uploaded by

Mario Arela
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views4 pages

Fuzzy Control for Inverted Pendulum

This document summarizes a study that compares different fuzzy logic control approaches for stabilizing an inverted pendulum system. Specifically, it compares a single-layer fuzzy rule set versus a two-layer rule set, and examines how different objective functions in an evolutionary algorithm affect the learned rule sets. An evolutionary algorithm is used to learn the rule sets by evaluating candidate solutions on their ability to control the nonlinear pendulum system dynamics. Results show that the objective function parameters and initial conditions can significantly impact the learned rule sets and controller performance.

Uploaded by

Mario Arela
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Learning fuzzy control laws for the inverted pendulum

R.J. Stonier A.J. Stacey C. Messom Mathematics & Decision Science Dept. of Mathematics Engineering Dept. Central Qld University R.M.I.T Singapore Polytechnic Rockhampton, Australia 4702 Melbourne, Australia 3000 Singapore 139651
In this paper we will examine the problem of learning an e cient fuzzy logic rule set for the control of the inverted pendulum with nonlinear dynamics using an evolutionary algorithm. In particular we compare a two layered rule set with a single fuzzy logic rule set. Furthermore we look at the e ect that di erent choices of objective function in the evolutionary algorithm have on the rule sets that are learnt.

Abstract

The cart is free to move left or right on a straight bounded track and the pole can swing in the vertical plane determined by the track. It is modelled by:
x1 _ x2 _ x3 _ x4 _
= = = =

x2 U + m`sinx3 x2 , x4 cosx3 =M + m _ 4 x4 g sinx3  + cosx3 U , m`x2 sinx3 =M + m 4 `4=3 , m cosX 32 =M + m

1 Introduction
The problem of controlling an inverted pendulum or simulated pole-cart system has a long history with approaches that use linear and nonlinear dynamics and include both classical and fuzzy logic control techniques, see for example 1, 2, 3, 4, 5, 6 and references therein. It has already been demonstrated that this system can be controlled by a fuzzy logic controller the rules for which can be learned by a genetic algorithms. However it is possible to start with fuzzy logic rule sets that have di erent structures, for instance a single rule set or a multilayered rule set. It is also known that when the objective function is changed in an optimization problem it can lead to di erent solutions. These variations can all lead to di erent rule sets which may be more or less e ective in controlling the system. It is an initial investigation of these possibilities that the paper examines. In particular we compare a multilayered, two-level rule set with a single fuzzy logic rule set and look at the e ect of including di erent objectives in the tness function. Although results will be given for the inverted pendulum there are obvious implications for other systems.

where x1 is the position of the cart, x2 is the velocity of the cart, x3 is the angle of the pole, x4 is the angular velocity of the pole, U is the control force on the cart, m is the mass of the pole, M is the mass of the cart, ` is the length of the pole, and g is gravitational acceleration. The control force is applied to the cart to prevent the pole from falling while keeping the cart within the speci ed bounds on the track. We take m = 0:1kg, M = 1kg, ` = 0:5m, g = 9:81ms,2, with state limits: ,1:0  x1  1:0, & ,=6  x3  =6. Our goal is to determine fuzzy controllers necessary to stabilise the system about the unstable reference position x = 0 as quickly as possible, whilst maintaining the system within the speci ed bounds given above.

3 Fuzzy Rule sets


Two structures for the fuzzy rule set are given, the rst a single rule base case. All four of the variables x1 to x4 are used as inputs to the fuzzy rule set to produce a control output U. For convenience all input and output domains were normalised to lie in ,1; 1 . Actual values are re-established for the integration of the state equations. Each domain region for xi was divided into 5 overlapping intervals and assigned linguistic values NB - Negative big, NS - negative small, CE - Center, PS - Positive small, PB - Positive big, which we encode numerically as integers from 1 to 5 respectively. The output variable U is divided into 7 overlapping regions, ve as above and appropriately

2 Inverted Pendulum Sytem


The nonlinear system to be controlled consists of the cart and a rigid pole hinged to the top of the cart.

two others, NM - Negative medium, and PM - Positive medium, with integer encoding 1 to 7. All fuzzy membership functions are assumed to be triangular. The fuzzy rule base of 625 rules may be represented by a 5  5  5  5 array Mi; j; k; `, with indices referring to the variables x1 to x4, and having the values 1 to 5. M i; j; k; ` will then take values from 1 to 7. As illustration, the rule de ned by M1; 2; 5; 4 = 2 reads: If x1 = NB and x2 = NS and x3 = PB and x4 = PS Then U = NM. In the two fuzzy knowledge base case the rst knowledge base has the two inputs, x1 position of the cart and x2 velocity of the cart  to produce as output an o set angle W . The o set angle is then added to x3 the current angle of the pole. This updated value of x3 and x4 the angular velocity of the pole are used as input in the second knowledge base to produce the control output U. The input and output fuzzy sets have the same conventions as de ned above but now the two fuzzy rule bases are stored in the two arrays M1 i; j, with indices referring to x1, and x2 producing a linguistic value for the o set angle W as its output, and M2 1; 2 with indices referring to x3, and x4 and producing a linguistic value for the control output U. There are a total of 50 fuzzy rules in this structure.

rule j in parent1 is not used and the corresponding rule in parent2 is used then both children inherit the rule that is used. Likewise, the reverse holds. If both rules are used or not used, we perform the traditional type of arithmetic crossover de ned by a constant with 0 1, except that rounding is used to ensure that integer entries are maintained. The purpose of the crossover algorithm is to ensure that information within the rule structure is passed to the children. In essence it is performing in some sense, fuzzy amalgamation 8 , on the y at a local level.
for j := 1 to length_of_string do begin ifparent1 j,2 =0 and parent2 j,2 0 then begin child1 j,1 := parent2 j,1 ; child2 j,1 := parent2 j,1 ; child1 j,2 := parent2 j,2 ; child2 j,2 := parent2 j,2 ; end else if parent1 j,2 0 and parent2 j,2 =0 then begin child1 j,1 := parent1 j,1 ; child2 j,1 := parent1 j,1 ; child1 j,2 := parent1 j,2 ; child2 j,2 := parent1 j,2 ; end else begin child1 j,1 := roundalpha*parent1 j,1 + 1-alpha*parent2 j,1 ; child2 j,1 := round1-alpha*parent1 j,1 + alpha*parent2 j,1 ; child1 j,2 := roundalpha*parent1 j,2 + 1-alpha*parent2 j,2 ; child2 j,2 := round1-alpha*parent1 j,2 + alpha*parent2 j,2 ; end; end;

4 Evolutionary learning
We use Evolutionary algorithms 7 whose strings are integer encoded to learn the fuzzy control rules in these fuzzy knowledge bases. This heuristic search technique maintains a population of strings Pt = fx1 t; : : : xntg at iteration t to the next t + 1. Each string encodes a rule bases which is a possible solution to the control problem. In creating the new population, proportional selection, a variant of arithmetic crossover, and a modi ed mutation operator are used. The strings are formed by storing the elements of the arrays above in a one dimensional array. For example the array M can be stored as a single linear array of 625 elements by linearising on the fastest changing index `. The arrays M1 and M2 can each be stored as a single 25 element linear array and then concatenated to form a linear array of 50 elements. A second integer is appended to each rule in the string producing a two dimensional array, ie 50  2 for our example. The second integer is used to store the clipping level applied to the output fuzzy set and is used in the crossover operator. The modi ed arithmetic crossover algorithm is encapsulated in the code below. It says that if a given

The mutation operator takes a probability of mutation and is de ned as: If xi k; 1 lies between but not including 1 and 7 it is mutated up or down by one with equal probability; if 1 it is mutated to 2, and if 7 it is mutated to 6. Further an `elite' option is used in developing the new population from the old and prescaling used to enhance convergence. The tness of a given string in the population can be evaluated as follows. Given an initial condition of the system we can decode each string xi into a fuzzy logic controller and apply this to the system by integrating the state equations by an Euler or Runge-Kutta algorithm with step size 0:02 over a su ciently long

time interval 0; T . An objective function fi to be minimised, is then calculated based on some measures of the behavior of the system over the time interval. Such measures might include, the accumulated sum of normalised absolute deviations of x1 and x3 from zero, the average deviation from vertical, the average deviation from the origin or T , TS where TS , the survival time, is taken to mean the total time before the pole and cart break some bounds. A penalty of 1000 is added to the objective if the nal state breaks the following bounds jx1j  0:1, jx2j  0:1, jx3j  =24, jx4j  3:0 . The problem is then converted to one of maximisation by de ning the tness of each string to be: Fi = Maxfit , fi if Maxfit , fi 0 or Fi = 0 otherwise, where Maxfit is the maximum value of the objective function in the current population. This procedure will nd a rule set that applies to a single initial con guration. The current objective function has the form: fi = !1f1 + !2 f2 + !3 f3 + !4 f4 + !5f5 with
1 f1 = N

Figure 1: Errors in x. with di ering initial random populations and a range of di erent initial con gurations see below. Di ering parameters !k , yielded substantially di erent convergence. For the example above, the objective function reduced to 280.2 quickly around 400 generations but would not reduce further in 1000 generations. A change to !1 = 300:0, !2 = 50:0, !3 = 50:0, !4 = 0:0, and !5 = 300:0, resulted in convergence to 37:8 in less than 500 generations and continued decrease to 29.3 in 1000 generations. For the single knowledge base case with, !1 = 1000:0, !2 = 0:0, !3 = 1000:0, !4 = 0:0, !3 = 5000:0, population size 60, = 0:7 and schedule for mutation probability reducing from pm = 0:6 to 0:001 over 400 generations, we obtained convergence to a good set of rules 416 used for the xed initial state above within around 600 generations. The objective function reduced to 29:38 in 1000 generations. Further after 400 generations jx1j 0:01. Similar good bounds were obtained on the other variables Showing that the single knowledge based yielded better rules as was expected. To nd a fuzzy controller that will control the system over a wide range of initial con gurations we select a grid of N initial con gurations in the workspace, then either: i We can form a set of N fuzzy logic knowledge bases, one for each initial con guration as shown above, and amalgamate these by averaging the outputs for each rule in the knowledge bases 9, 10 to produce a single fuzzy logic knowledge base that can be used e ectively for a wide range of initial con gurations not just those de ned by the grid; or ii We can determine a single fuzzy knowledge base by a single evolutionary learning algorithm by making appropriate modi cations to the algorithm. See 8 for application in target tracking. Due to the limited space requirement, we report here brie y on our initial investigations using ii .

X jx j ; f X jx j ; f X jx j ; x N x N X jx j ; f TS , N ; f
N
1 4 =

max N
1

2=

N
1

_ max 5=

3 =

N
1

max

where xmax = 1:0, Max = pi=6, xmax = 1:0, _Max = _ 3:0, N is the number of iteration steps, T = 0:02  T S and !k are selected positive weights. The rst and second terms determine the accumulated sum of normalised absolute deviations of x1 and x3 from zero and the third term when minimised, maximises the survival time.

4 _max

and

N

5 Results
For the two knowledge base case with !1 = 3000:0, !2 = 0:0, !3 = 3000:0, !4 = 0:0, !3 = 5000:0, population size 100, = 0:7 and schedule for mutation probability reducing from pm = 0:7 to 0:001 over 1000 generations, we obtained similar quick convergence to two "best" knowledge bases. The following diagram shows the x error for the best solution obtained from the initial con guration 0:5; [Link]; 0:0. Similar _ for , x and . We observe that within the rst _ 200 iterations, 4.0 time units the system has converged to within the speci ed bounds. It was found that the decreasing schedule of mutation probability ensured faster convergence, further it increased the probability of successful application of the algorithm

First with a signi cantly increased population, typically around 400 for N = 256, tness evaluation for each string is taken as an average of the tness evaluated on a random selection of distinct initial con gurations from the grid, typically we used 10. This was done in order to ensure objective performance across the grid. Second, in the elitist strategy, we typically made 4 or 5 copies each of the top 5 "best" strings in passing from the old generation to the new. These strings when passed across have their tness values again re-evaluated as above. This many were carried across to ensure that information in the previous population was not lost completely in re-evaluation of tness in the new population. After a terminating at a number of generations in this case, we took from the nal population a string obtained as an average of the top 10 "best" strings, rounding to integer values in both components of the string after averaging. This string was taken to be the nal knowledge base obtained by the algorithm. It has been found to be a very successful operation as all the knowledge in this process appears very di cult to achieve in an single best performing string. Finally to measure the performance of this nal set of fuzzy rules its objective value was evaluated at some 20 randomly selected di erent con gurations in the grid. These modi cations clearly increase quite drastically the amount of computation and execution time. This consideration is traded o against the successful computation of the N knowledge bases and the averaging required in i. The grid was taken uniform with maximum jx1j = 0:75 and maximum jx3j = =12 when N = 256. Results so far obtained for the single knowledge base with initial selection of !k given above show slow convergence and after 1,900 generations on average 2 3 terminal constraints being broken. For the two knowledge base case with a smaller grid N = 50 and maximum jx1j = 0:35, after 1000 generations on average no terminal constraints are being broken.

pole-cart system constructed at the Singapore Polytechnique. The given nonlinear state equations do not represent a truly accurate model of their system. Nevertheless it is worthwhile to compare their fuzzy rules with those that we have obtained via the evolutionary learning algorithm.

References
1 J.J.E. Slotine and W. Li, Applied Nonlinear Control, Prentice Hall, 1991. 2 C.T. Lin and C.S.G. Lee, Neural Fuzzy Systems, Prentice Hall, 1996. 3 M.O. Odetayo and D.R. McGregor, Genetic Algorithm for Inducing control Rules for a Dynamic System, Proc. of the 3rd Int. Conf. on Genetic Algorithms, George Mason University, pp. 177-182, 1989. 4 A. Varsek, T. Urbancic and B. Filipic, Genetic Algorithms in Controller Design and Tuning, IEEE Trans. on Systems, Man, and Cybernetics, Vol. 23, No. 5, pp. 1330-1339, 1993. 5 C.W. Anderson, Learning to Control an Inverted Pendulum Using Neural Networks, IEEE Control Systems Magazine, pp. 31-36, April 1989. 6 M.A. Lee and H. Takagi, Integrating design stages of fuzzy systems using genetic algorithms, Proc. IEEE Int. Conf. Fuzzy Systems, Vol. I, pp. 612617, San Francisco, 1993. 7 Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, SpringerVerlag, 1992. 8 R.J. Stonier and M. Mohammadian, Knowledge Acquisition for Target Capture, Proc. of the IEEE Int. Conf. on Evolutionary Computing May 1998, Anchorage, Alaska. 9 R.J. Stonier and M. Mohammadian, Self Learning Hierarchical Fuzzy Logic Controller in MultiRobot Systems, Proc. of the IEA Conf. Control95, Melbourne Australia, pp. 381 386, October 1995. 10 R.J. Stonier and M. Mohammadian, Evolutionary Learning in Fuzzy Logic Control Systems, Complex Systems 96; From Local Interactions to Global Phenoma, Eds. [Link] et al., IOS Press,

6 Conclusion
We have determined using an evolutionary learning algorithm fuzzy rules for a single and two layer knowledge base for the control of the inverted pendulum. It is fast learning in general from a given initial con guration. The algorithm has been extended to nd knowledge bases that will e ectively control the system within speci ed bounds from a wide range of initial con gurations. The research is current. The two knowledge bases determined had only 13 rules in common with those developed experimentally for the

pp. 193 212, July 1996.

You might also like