### 使用Keras实现Faster R-CNN训练复现
#### 了解目标检测原理
目标检测领域中,R-CNN系列算法是经典之作。从最初的R-CNN到Fast R-CNN再到如今广泛应用的Faster R-CNN,在性能和速度方面不断优化改进[^1]。
#### 构建特征提取网络
为了能够有效地识别图像中的物体,构建一个强大的特征提取器至关重要。通常采用预训练好的卷积神经网络作为骨干网,比如VGG、ResNet等。通过迁移学习的方式加载已有的权重参数可以加速收敛并提高泛化能力。
```python
from keras.applications import VGG16
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(None, None, 3))
for layer in base_model.layers:
layer.trainable = False
```
#### 设计区域提议网络(RPN)
RPN负责生成候选框(anchor boxes),并对它们进行初步筛选得到前景(proposals)与背景之分。该部分由两层全连接层组成,并分为分类支路和回归支路分别预测类别概率分布以及边界框坐标偏移量[^2]。
```python
import tensorflow as tf
from keras.layers import Conv2D, Flatten, Dense
def build_rpn(base_layers, num_anchors):
x = Conv2D(512, (3, 3), padding='same', activation='relu')(base_layers)
# Classification branch predicts objectness score for each anchor
x_class = Conv2D(num_anchors, (1, 1), activation='sigmoid')(x)
# Regression branch refines bounding box coordinates relative to anchors
x_regr = Conv2D(num_anchors * 4, (1, 1))(x)
return [x_class, x_regr]
```
#### 定义自定义损失函数
不同于传统分类任务,Faster R-CNN采用了特殊的损失计算方式——仅考虑正负样本参与反向传播更新过程;对于忽略类别的锚点则不予理会。具体来说就是在交叉熵基础上加入了一个掩码机制来屏蔽掉不需要的部分[^3]。
```python
def rpn_loss_cls(y_true, y_pred):
"""Custom classification loss function."""
mask = tf.cast(tf.not_equal(y_true, -1.), 'float32')
cls_loss = K.binary_crossentropy(y_true, y_pred)[..., 0]
normalizer = K.maximum(K.sum(mask), 1.)
return K.sum(cls_loss * mask) / normalizer
def smooth_l1(x):
abs_x = K.abs(x)
min_abs_x = K.minimum(abs_x, 1)
loss = 0.5 * ((abs_x - 1) * min_abs_x + abs_x)
return loss
def rpn_loss_regr(y_true, y_pred):
"""Custom regression loss function with L1-smooth."""
sigma_squared = 9.
diff = y_pred - y_true[..., :-1]
abs_diff = K.abs(diff)
mask = tf.cast(tf.equal(y_true[..., -1], 1.), 'float32')
normalizer = K.maximum(K.sum(mask), 1.)
return K.sum(
mask *
tf.where(
K.less(abs_diff, 1),
0.5 * sigma_squared * K.square(diff),
abs_diff - 0.5 / sigma_squared,
)
) / normalizer
```
#### 准备数据集及标签制作
除了准备好待处理的数据外,还需要针对每张图片生成对应的anchor box及其对应的真实标签信息。这一步骤涉及到读取原始标注文件(xml/json格式),解析其中的目标位置描述等内容后再映射回至相应尺度下的特征图上去[^5]。
```python
# Example of creating training anchors based on given feature map size and predefined base anchors
def create_train_anchors(feature_size, base_anchors, stride=16):
anchors = []
for row in range(feature_size[0]):
for col in range(feature_size[1]):
cx = col * stride + int(stride/2)
cy = row * stride + int(stride/2)
for bx, by, bw, bh in base_anchors:
anchors.append([
cx + bx, # center x offset from cell's top-left corner
cy + by, # center y offset from cell's top-left corner
bw, # width scale factor w.r.t original image dimensions
bh # height scale factor h.r.t original image dimensions
])
return np.array(anchors).reshape(-1, 4)
# Function that assigns ground truth labels to generated proposals during training phase
def assign_labels_to_proposals(gt_boxes, all_anchors, iou_threshold_positive=0.7, iou_threshold_negative=0.3):
...
```