大模型微调之 Accelerate集成DeepSpeed 实战

一、配置Accelerate-config

可以启动accelerate config交互式配置,也可以直接改config文件。

accelerate config

交互内容大致为:

(Qwen2-5-vl) zx@s-System-Product-Name:/whut_data/Yujg$ accelerate config
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------In which compute environment are you running?
This machine                                                                                                                                                                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Which type of machine are you using?                                                                                                                                                                                                 
multi-GPU                                                                                                                                                                                                                            
How many different machines will you use (use more than 1 for multi-node training)? [1]: 1                                                                                                                                           
Should distributed operations be checked while running for errors? This can avoid timeout issues but will be slower. [yes/NO]: no                                                                                                    
Do you wish to optimize your script with torch dynamo?[yes/NO]:no                                                                                                                                                                    
Do you want to use DeepSpeed? [yes/NO]: yes                                                                                                                                                                                          
Do you want to specify a json file to a DeepSpeed config? [yes/NO]: no                                                                                                                                                               
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------What should be your DeepSpeed's ZeRO optimization stage?                                                                                                                                                                             
2                                                                                                                                                                                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Where to offload optimizer states?                                                                                                                                                                                                   
none                                                                                                                                                                                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Where to offload parameters?                                                                                                                                                                                                         
none                                                                                                                                                                                                                                 
How many gradient accumulation steps you're passing in your script? [1]: 1                                                                                                                                                           
Do you want to use gradient clipping? [yes/NO]: yes                                                                                                                                                                                  
What is the gradient clipping value? [1.0]: 1                                                                                                                                                                                        
Do you want to enable `deepspeed.zero.Init` when using ZeRO Stage-3 for constructing massive models? [yes/NO]: no
Do you want to enable Mixture-of-Experts training (MoE)? [yes/NO]: no
How many GPU(s) should be used for distributed training? [1]:2
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Do you wish to use mixed precision?
bf16                                                                                                                                                                                                                                 
accelerate configuration saved at /home/zx/.cache/huggingface/accelerate/default_config.yaml  

配置完成会生成default_config.yaml 文件,打开可直接修改配置文件进行配置。

compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:  # 在你交互时选择是否启动deepspeed时才会有,也可以通过代码DeepSpeedPlugin设置
  gradient_accumulation_steps: 1 # 梯度积累步数
  gradient_clipping: 1.0
  offload_optimizer_device: none # 卸载优化器到cpu,减少显存占用,但会减慢训练速度。
  offload_param_device: none     # 卸载优化器到cpu,减少显存占用,但会减慢训练速度。
  zero3_init_flag: false
  zero_stage: 2                  # 分布式训练模式:1,2,3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
enable_cpu_affinity: false
machine_rank: 0
main_training_function: main
mixed_precision: 'no'
num_machines: 1                  # 机器数1--1台机器--Global rank
num_processes: 2                 # 进程数2--2GPU
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

关于deepspeed配置部分,有两种方法:

(1)accelerate config里面交互式启动,然后选择deepspeed相关的参数即可。

Do you want to use DeepSpeed? [yes/NO]: yes

(2)代码文件中通过DeepSpeedPlugin传入Accelerator。

from accelerate import Accelerator, DeepSpeedPlugin

    deepspeed_plugin = DeepSpeedPlugin(zero_stage=2, gradient_clipping=1.0)
    accelerator = Accelerator(deepspeed_plugin=deepspeed_plugin)

二、使用Accelerator训练模型-实战

以一个简单的神经网络为例:

1.导入需要的包

from accelerate import Accelerator, DeepSpeedPlugin
import torch
from torch.utils.data import TensorDataset, DataLoader

2.定义简单网络

class SimpleNet(torch.nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(SimpleNet, self).__init__()
        self.fc1 = torch.nn.Linear(input_dim, hidden_dim)
        self.fc2 = torch.nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

3.主函数部分

def main():
    # 网络结构参数
    input_dim = 10
    hidden_dim = 20
    output_dim = 2
    batch_size = 64
    data_size = 10000

    # 随机生成数据、标签
    input_data = torch.randn(data_size, input_dim)
    labels = torch.randn(data_size, output_dim)
    
    # 加载数据
    dataset = TensorDataset(input_data, labels)
    dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True) # shuffle=True,打乱数据顺序 

    # 创建模型
    model = SimpleNet(input_dim, hidden_dim, output_dim)

    # 配置deepspeed参数, 这里我在yaml里面配置过了就不需要传入了
    # deepspeed_plugin = DeepSpeedPlugin(zero_stage=2, gradient_accumulation_steps=1, gradient_clipping=1)
    # accelerator = Accelerator(deepspeed_plugin=deepspeed_plugin)
    accelerator = Accelerator()

    # 创建优化器和loss函数
    optimization = torch.optim.Adam(model.parameters(), lr=0.00015)
    crition = torch.nn.MSELoss()

    # 使用accelerator必要的一步,传入将模型相关的参数都传入 .prepare
    model, dataloader, optimization= accelerator.prepare(model, dataloader, optimization)

    # 分布式训练
    for epoch in range(1000):
        model.train()
        for batch in dataloader:
            inputs, labels = batch
            outputs = model(inputs)
            loss = crition(outputs, labels)

            optimization.zero_grad() # 梯度清零
            accelerator.backward(loss) # 必须使用accelerator.backward
            optimization.step()
        print(f"Epoch{epoch} loss{loss.item()}")

    accelerator.save(model.state_dict(), "model.pth")


if __name__ == '__main__':
    main()

4.运行:

accelerate launch --config_file acc_dp_cfg.yaml acc_dp.py

需要使用accelerate launch启动,--config_file选择自己的yaml配置文件。

运行结果:

Epoch0 loss0.9848306179046631
Epoch0 loss0.9311917424201965
Epoch1 loss1.0081628561019897
Epoch1 loss0.8537667989730835
Epoch2 loss1.153598666191101
Epoch2 loss1.3423147201538086
Epoch3 loss0.8436083197593689
Epoch3 loss0.9412739276885986
Epoch4 loss0.8727619647979736
Epoch4 loss0.9575796127319336
Epoch5 loss1.285043478012085
Epoch5 loss1.1377384662628174
Epoch6 loss1.0347694158554077
Epoch6 loss1.0022046566009521
Epoch7 loss1.0058305263519287
Epoch7 loss1.0705955028533936
Epoch8 loss1.0044686794281006
Epoch8 loss1.0007789134979248

2GPU分布式训练,每次打印2个loss。

### 大模型微调实战教程与示例代码 大模型微调是一种重要的技术,它能够在预训练的基础上利用特定任务的数据来优化模型性能。以下是基于已有参考资料的内容以及实际操作中的常见实践。 #### 1. 微调的基础概念 微调的核心在于通过特定任务数据集重新训练部分或全部参数,从而使模型更好地适配目标场景。这种方法不仅保留了原始模型的泛化能力,还增强了其针对具体应用的能力[^2]。 #### 2. 环境准备与依赖安装 在开始之前,需要确保环境已正确配置并加载必要的库文件。通常情况下会涉及 PyTorch 或 TensorFlow 的安装以及其他辅助工具包如 Transformers 库等。下面是一个简单的 Python 脚本用于初始化: ```bash pip install transformers datasets torch accelerate ``` #### 3. 数据处理流程 为了有效执行微调过程,必须先准备好高质量标注过的语料集合,并将其转换为适合输入的形式。这一步骤可能包括分词、编码向量化等工作。 假设我们正在构建一个分类器,则可以按照如下方式读取 CSV 文件作为源材料之一: ```python from datasets import load_dataset dataset = load_dataset('csv', data_files={'train': 'path/to/train.csv', 'test':'path/to/test.csv'}) tokenizer_name="distilbert-base-uncased" max_length=128 def preprocess_function(examples): return tokenizer(examples['text'], truncation=True, padding='max_length', max_length=max_length) tokenized_datasets = dataset.map(preprocess_function, batched=True) ``` 这里 `preprocess_function` 函数负责完成文本序列化的准备工作;而 `load_dataset()` 方法则帮助快速导入本地存储或者公开可用的标准测试样本组。 #### 4. 构建自定义Attention Mask逻辑 (可选扩展功能) 如果项目需求特殊定制化程度较高的话,还可以引入额外控制结构比如 Attention Masks 来提高效率和效果。例如当面对变长序列时可以通过设置不同的 mask 值指示哪些部分应该被忽略掉从而减少不必要的计算开销[^4]: ```python import numpy as np def create_attention_masks(input_ids): attention_masks = [] for seq in input_ids: seq_mask = [float(i>0) for i in seq] attention_masks.append(seq_mask) return np.array(attention_masks) ``` 此函数遍历每条记录生成对应的布尔型列表表示哪里存在真实单词字符而非补白符号的位置标记出来形成最终版本供后续阶段使用。 #### 5. 加载预训练模型 & 开始Fine-Tuning 最后就是选取合适的基底架构实例化对象之后传入先前加工好的素材启动正式的学习环节啦! 以 HuggingFace 提供的支持为例展示一段精简版脚本片段: ```python from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer model_checkpoint = "distilbert-base-uncased" num_labels = len(set(tokenized_datasets["train"]["label"])) model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels) training_args = TrainingArguments( output_dir="./results", evaluation_strategy="epoch", learning_rate=2e-5, per_device_train_batch_size=16, per_device_eval_batch_size=16, num_train_epochs=3, weight_decay=0.01, ) trainer = Trainer( model=model, args=training_args, train_dataset=tokenized_datasets["train"], eval_dataset=tokenized_datasets["validation"], tokenizer=tokenizer, ) trainer.train() ``` 上述代码展示了如何指定超参选项并通过官方封装类简化整个迭代周期管理事务的操作手法。 --- ###
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值