倒立摆python仿真
时间: 2025-06-08 11:55:18 浏览: 16
### 倒立摆系统的Python仿真实现
倒立摆系统是一种经典的控制问题,广泛用于验证不同控制策略的有效性。以下是基于经典控制理论和现代强化学习技术的两种方法来模拟倒立摆的行为。
#### 方法一:基于线性化模型的经典控制
在经典控制中,可以通过状态空间表示法建模并设计控制器。以下是一个简单的比例-微分 (PD) 控制器的例子:
```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
# 定义倒立摆的动力学方程
def pendulum_dynamics(state, t, u):
theta, dtheta = state # 角度和角速度
g = 9.81 # 重力加速度
l = 1 # 杆长
m = 0.1 # 小车质量
M = 1 # 摆锤质量
sin_theta = np.sin(theta)
cos_theta = np.cos(theta)
ddtheta = (u + m * l * dtheta ** 2 * sin_theta - (M + m) * g * sin_theta) / \
((4/3) * (M + m) * l - m * l * cos_theta ** 2)
return [dtheta, ddtheta]
# PD控制器的设计
def pd_controller(error, de_error, kp=5, kd=1):
return -(kp * error + kd * de_error)
# 初始条件
initial_state = [np.pi / 2, 0] # 初始角度为π/2弧度(垂直向上),初始角速度为零
time = np.linspace(0, 10, 1000) # 时间向量
state_history = []
control_input = []
for i in range(len(time)-1):
current_time = time[i]
next_time = time[i+1]
if i == 0:
state = initial_state
else:
state = sol[-1]
theta, dtheta = state
control_signal = pd_controller(np.pi/2 - theta, -dtheta) # 计算控制信号
sol = odeint(pendulum_dynamics, state, [current_time, next_time], args=(control_signal,))
state_history.append(sol[-1])
control_input.append(control_signal)
state_history = np.array(state_history)
angles = state_history[:, 0]
angular_velocities = state_history[:, 1]
plt.figure(figsize=(10, 6))
plt.subplot(2, 1, 1)
plt.plot(time[:-1], angles, label="Angle (rad)")
plt.legend()
plt.grid()
plt.subplot(2, 1, 2)
plt.plot(time[:-1], angular_velocities, label="Angular Velocity (rad/s)", color='orange')
plt.legend()
plt.grid()
plt.tight_layout()
plt.show()
```
上述代码实现了基于PD控制器的倒立摆仿真[^1]。它通过`odeint`求解动力学方程,并实时计算控制输入以保持倒立摆直立。
---
#### 方法二:基于深度Q网络(DQN)的强化学习控制
对于更复杂的场景,可以采用深度强化学习中的DQN算法。下面是一个简化版的DQN框架示例:
```python
import gym
import torch
import torch.nn as nn
import torch.optim as optim
from collections import deque
class DQN(nn.Module):
def __init__(self, input_dim, output_dim):
super(DQN, self).__init__()
self.fc = nn.Sequential(
nn.Linear(input_dim, 64),
nn.ReLU(),
nn.Linear(64, output_dim)
)
def forward(self, x):
return self.fc(x)
env = gym.make('CartPole-v1') # 使用OpenAI Gym环境
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = DQN(env.observation_space.shape[0], env.action_space.n).to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
memory = deque(maxlen=10000)
gamma = 0.99
epsilon_start = 1.0
epsilon_end = 0.01
epsilon_decay = 0.995
def select_action(state, epsilon):
if np.random.rand() < epsilon:
return env.action_space.sample()
with torch.no_grad():
q_values = model(torch.tensor(state, dtype=torch.float32, device=device))
return torch.argmax(q_values).item()
def train_model():
batch_size = 32
minibatch = random.sample(memory, batch_size)
states, actions, rewards, next_states, dones = zip(*minibatch)
states_tensor = torch.tensor(states, dtype=torch.float32, device=device)
actions_tensor = torch.tensor(actions, dtype=torch.long, device=device)
rewards_tensor = torch.tensor(rewards, dtype=torch.float32, device=device)
next_states_tensor = torch.tensor(next_states, dtype=torch.float32, device=device)
dones_tensor = torch.tensor(dones, dtype=torch.bool, device=device)
current_q = model(states_tensor).gather(1, actions_tensor.unsqueeze(-1)).squeeze()
next_q = model(next_states_tensor).max(dim=1)[0].detach()
target_q = rewards_tensor + gamma * next_q * (~dones_tensor)
loss = nn.MSELoss()(current_q, target_q)
optimizer.zero_grad()
loss.backward()
optimizer.step()
episodes = 200
epsilon = epsilon_start
for episode in range(episodes):
state = env.reset()[0]
total_reward = 0
done = False
while not done:
action = select_action(state, epsilon)
next_state, reward, terminated, truncated, _ = env.step(action)
done = terminated or truncated
memory.append((state, action, reward, next_state, done))
state = next_state
total_reward += reward
if len(memory) >= batch_size:
train_model()
epsilon = max(epsilon_end, epsilon * epsilon_decay)
print(f"Episode {episode}, Total Reward: {total_reward}")
```
此代码展示了如何利用PyTorch构建一个基本的DQN模型来解决倒立摆问题[^3]。注意,训练过程可能需要较长时间,尤其是在CPU上运行时。
---
### 总结
以上提供了两种不同的倒立摆仿真方法:一种是基于传统控制理论的PD控制器,另一种则是基于深度强化学习的DQN算法。两者各有优劣,具体选择取决于应用场景和技术需求。
阅读全文
相关推荐


















