TensorFlow多GPU训练_bert的tensorflow的1.x版本的单机多gpu-CSDN博客

本文链接：https://blog.csdn.net/lly1122334/article/details/118931338

文章目录

问题描述
解决方案
提升GPU利用率
参考文献

问题描述

单机多GPU训练，多机请自行查阅参考文献

解决方案

使用 tf.distribute.MirroredStrategy 的原理：

训练开始前，该策略在 N 个 GPU 上复制一份完整模型
每次训练传入一个批次数据时，将数据分成 N 份，分别传入 N 个 GPU
N 个 GPU 使用本地变量分别计算自己那部分数据的梯度
使用分布式计算的 All-reduce 操作，在 GPU 间高效交换梯度数据并进行求和
使用梯度求和的结果更新本地变量
当所有设备均更新本地变量后，进行下一轮训练
默认情况下，TensorFlow 中的 MirroredStrategy 策略使用 NVIDIA NCCL 进行 All-reduce 操作。

安装

pip install tensorflow-datasets --upgrade

使用前

import tensorflow as tf
import tensorflow_datasets as tfds


def resize(image, label):
    """图像预处理"""
    image = tf.image.resize(image, [224, 224]) / 255.0
    return image, label


batch_size = 64
dataset = tfds.load('cats_vs_dogs', split=tfds.Split.TRAIN, as_supervised=True)
dataset = dataset.map(resize).shuffle(1024).batch(batch_size)

model = tf.keras.applications.MobileNetV2(weights=None, classes=2)
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss=tf.keras.losses.sparse_categorical_crossentropy,
    metrics=[tf.keras.metrics.sparse_categorical_accuracy]
)

model.fit(dataset, epochs=5)
# Epoch 1/5
# 364/364 [==============================] - 110s 303ms/step - loss: 0.6229 - sparse_categorical_accuracy: 0.6500
# Epoch 2/5
# 364/364 [==============================] - 111s 305ms/step - loss: 0.4781 - sparse_categorical_accuracy: 0.7690
# Epoch 3/5
# 364/364 [==============================] - 110s 301ms/step - loss: 0.3919 - sparse_categorical_accuracy: 0.8202
# Epoch 4/5
# 364/364 [==============================] - 113s 311ms/step - loss: 0.3171 - sparse_categorical_accuracy: 0.8602
# Epoch 5/5
# 364/364 [==============================] - 113s 311ms/step - loss: 0.2532 - sparse_categorical_accuracy: 0.8919

使用后

import tensorflow as tf
import tensorflow_datasets as tfds


def resize(image, label):
    """图像预处理"""
    image = tf.image.resize(image, [224, 224]) / 255.0
    return image, label


strategy = tf.distribute.MirroredStrategy()
batch_size = 64 * strategy.num_replicas_in_sync  # 批次大小×设备数量

dataset = tfds.load('cats_vs_dogs', split=tfds.Split.TRAIN, as_supervised=True)
dataset = dataset.map(resize).shuffle(1024).batch(batch_size)

with strategy.scope():
    model = tf.keras.applications.MobileNetV2(weights=None, classes=2)
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
        loss=tf.keras.losses.sparse_categorical_crossentropy,
        metrics=[tf.keras.metrics.sparse_categorical_accuracy]
    )

print('Number of devices: {}'.format(strategy.num_replicas_in_sync))
model.fit(dataset, epochs=5)
# Number of devices: 4
# Epoch 1/5
# 91/91 [==============================] - 35s 390ms/step - loss: 0.6459 - sparse_categorical_accuracy: 0.6374
# Epoch 2/5
# 91/91 [==============================] - 34s 377ms/step - loss: 0.5499 - sparse_categorical_accuracy: 0.7225
# Epoch 3/5
# 91/91 [==============================] - 34s 373ms/step - loss: 0.4560 - sparse_categorical_accuracy: 0.7826
# Epoch 4/5
# 91/91 [==============================] - 35s 382ms/step - loss: 0.3811 - sparse_categorical_accuracy: 0.8285
# Epoch 5/5
# 91/91 [==============================] - 34s 379ms/step - loss: 0.3274 - sparse_categorical_accuracy: 0.8558