文章目录
- one neuron in the ANN
- references
one neuron in the ANN
- the design for neuron model of ANN was inspired by neuronbiology,the core struct include input,weight,activation function and output.
- the output of a neuron can be simulated with the following mathmatical model.
y=f(∑i=1nwixi+b) y = f\left(\sum_{i=1}^{n} w_i x_i + b \right) y=f(i=1∑nwixi+b)
- xix_ixi:the i-th input from neurons in the previous level or the orignal input data.
- wiw_iwi:the weight corresponding to the input xix_ixi ,which means the degree of importance.
- bbb:the Bias to adjust the activiation threshold of neurons
- f(⋅)f(\cdot)f(⋅):the activiation function,which enable the network to learn complex patterns through applying nonlinearity.
- the inputted weighted summation,which is linear transformation, be firstly computed.
z=∑i=1nwixi+b
z = \sum_{i=1}^{n} w_i x_i + b
z=i=1∑nwixi+b
secondly,that summation will be taken to activiation function f(z)f(z)f(z),through the following nonlinear function in order to similate the complicated function.
- Sigmoid:f(z)=11+e−zf(z) = \frac{1}{1 + e^{-z}}f(z)=1+e−z1,that output range from 0 to 1 ,can be apply in probability.
- ReLU:f(z)=max(0,z)f(z) = \max(0, z)f(z)=max(0,z),can settle the matter that the vanishing gradient,to be used in the hidden level widely.
- Tanh:f(z)=tanh(z)f(z) = \tanh(z)f(z)=tanh(z),that output range from -1 to 1,used to centralized data.
- Softmax:multi-classification output level,the output will be convert to the probability distribution.
- to handle the batch data such as a matrix X\mathbf{X}X,the following form for computing will be apply.
y=f(Xw+b) \mathbf{y} = f(\mathbf{X} \mathbf{w} + \mathbf{b}) y=f(Xw+b)
w\mathbf{w}w is weighted vector ,the b\mathbf{b}b is the bias vector.
that computation of matrix multiplication can be accelerated with GPU .
5. the entire process of a neuron’s action can be explained with the following python code using JAX.
import jax
import jax.numpy as jnp
from jax import grad, vmap, jit
import matplotlib.pyplot as plt
# ------------------------------
# 1. 定义神经元模型
# ------------------------------
def neuron(params, x):
"""带激活函数的单个神经元"""
z = jnp.dot(x, params['w']) + params['b'] # 加权和 + 偏置
return jax.nn.sigmoid(z) # Sigmoid激活函数 (可替换为 relu/tanh)
# ------------------------------
# 2. 初始化参数和超参数
# ------------------------------
input_dim = 2 # 输入特征维度
learning_rate = 0.1
epochs = 1000
# 随机初始化权重和偏置
key = jax.random.PRNGKey(42)
params = {
'w': jax.random.normal(key, (input_dim,)), # 权重向量
'b': 0.0 # 偏置
}
# ------------------------------
# 3. 生成合成数据 (OR逻辑门)
# ------------------------------
X = jnp.array([
[0, 0],
[0, 1],
[1, 0],
[1, 1]
])
y = jnp.array([0, 1, 1, 1]) # OR逻辑门的输出
# ------------------------------
# 4. 定义损失函数和梯度计算
# ------------------------------
@jit # JIT编译加速
def loss_fn(params, X_batch, y_batch):
"""均方误差损失"""
predictions = vmap(neuron, in_axes=(None, 0))(params, X_batch) # 批量预测
return jnp.mean((predictions - y_batch) ** 2)
compute_grads = grad(loss_fn) # 自动微分函数
# ------------------------------
# 5. 训练循环
# ------------------------------
loss_history = []
for epoch in range(epochs):
# 计算梯度和损失
grads = compute_grads(params, X, y)
loss = loss_fn(params, X, y)
loss_history.append(loss)
# 梯度下降更新参数
params = {
'w': params['w'] - learning_rate * grads['w'],
'b': params['b'] - learning_rate * grads['b']
}
# 每100轮打印进度
if epoch % 100 == 0:
print(f"Epoch {epoch}, Loss: {loss:.4f}")
# ------------------------------
# 6. 结果可视化
# ------------------------------
# 绘制损失曲线
plt.plot(loss_history)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training Loss")
plt.show()
# ------------------------------
# 7. 测试预测
# ------------------------------
# 定义批量预测函数
predict = vmap(neuron, in_axes=(None, 0))
# 在训练数据上测试
predictions = predict(params, X)
print("\nPredictions:")
for x, pred in zip(X, predictions):
print(f"Input: {x}, Output: {pred:.4f} → Predicted class: {int(pred > 0.5)}")
# ------------------------------
# 8. 输出训练后的参数
# ------------------------------
print("\nTrained parameters:")
print(f"weights: {params['w']}")
print(f"bias: {params['b']}")
references
- deepseek
- 《神经网络与机器学习》