【联邦学习框架FLGo】FLGo运行配置（一）初始化和数据集-CSDN博客

本文链接：https://blog.csdn.net/C__King_6/article/details/143563075

文章目录

一、初始化
二、数据集

一、初始化

通过flgo.init()函数创建一个runner（运行器），调用run方法开启迭代训练。

flgo.init函数介绍

函数位置在/flgo/utils/fflow.py中，函数定义为：

def init(task: str, algorithm, option = {}, model=None, Logger: flgo.experiment.logger.BasicLogger = None, Simulator: BasicSimulator=flgo.simulator.DefaultSimulator, scene='horizontal');

init接收的输入包括：

task：联邦任务路径，由flgo.gen_task生成的任务路径；
algorithm：联邦算法，要求algorithm的类型是class或module，横向联邦中需要其具备- algorithm.Server和algorithm.Client两个可访问的属性；
option（可选）：运行选项，类型为字典，包含运行时的各类参数；
model（可选）：待优化的模型模块，要求model的类型为类或module，横向联邦中需要其具备model.init_global_module和model.init_local_module两个可访问的方法，为实体初始化module；
Logger（可选）：日志记录器类，要求父类为flgo.experiment.logger.BasicLogger
Simulator（可选）：系统模拟器类，要求父类为flgo.system_simulator.BasicSimulator
scene（可选）：类型为字符串，联邦场景（e.g. 横向 or 纵向）

在flgo中，每个runner的运行参数由传入的字典option来指定。

下表列出参数的中文释义和用法：
参数列表

flgo实现在MNIST数据集上进行fedavg算法训练实例：

import flgo
import flgo.benchmark.mnist_classification as mnist
import flgo.algorithm.fedavg as fedavg
import os
import flgo.experiment.analyzer as al


# 任务路径
task = '/kaggle/working/test_mnist'
# 数据集划分，将mnist数据集以IID形式划分为100个客户端，
config = {
    'benchmark': {'name': 'flgo.benchmark.mnist_classification'},
    'partitioner': {'name': 'IIDPartitioner', 'para': {'num_clients': 100}}
}
if not os.path.exists(task): flgo.gen_task(config,task_path = task)

# 绘制训练过程中的验证集损失和验证集准确率
analysis_plan = {
    'Selector': {
        'task': task,
        'header': ['fedavg']
    },
    'Painter': {
        'Curve': [
            {'args': {'x': 'communication_round','y': 'val_loss'}, 'fig_option': {'title': 'valid loss on mnist'}},
            {'args': {'x': 'communication_round','y': 'val_accuracy'},'fig_option': {'title': 'valid accuracy on mnist'}},
        ]
    }
}

# 初始化一个runner，调用run方法开启迭代训练
# 训练轮次为5轮，本地训练1次
fedavg_runner = flgo.init(task,fedavg,{'num_rounds': 5, 'num_epochs': 1, 'gpu': 0})
fedavg_runner.run()

al.show(analysis_plan)

训练结果：

损失

准确率

二、数据集

在FLGo中，一个数据集的一种划分将对应一个联邦任务。

生成不同数据划分的联邦任务：

import flgo
import flgo.benchmark.mnist_classification as mnist
import os

"""
benchmark指定数据集名称
partitioner指定数据划分器
"""
config_iid = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'IIDPartitioner','para':{'num_clients':100}}}
config_div01 = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'DiversityPartitioner','para':{'num_clients':100, 'diversity':0.1}}}
config_div05 = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'DiversityPartitioner','para':{'num_clients':100, 'diversity':0.5}}}
config_div09 = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'DiversityPartitioner','para':{'num_clients':100, 'diversity':0.9}}}
config_dir01 = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'DirichletPartitioner','para':{'num_clients':100, 'alpha':0.1}}}
config_dir10 = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'DirichletPartitioner','para':{'num_clients':100, 'alpha':1.0}}}
config_dir50 = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'DirichletPartitioner','para':{'num_clients':100, 'alpha':5.0}}}
# 数据集划分的任务字典
task_dict = {
    './mnist_iid': config_iid,
    './mnist_div01': config_div01,
    './mnist_div05': config_div05,
    './mnist_div09': config_div09,
    './mnist_dir01': config_dir01,
    './mnist_dir10': config_dir10,
    './mnist_dir50': config_dir50,
}
# 生成不同任务的任务路径
for task in task_dict:
    if not os.path.exists(task):
        flgo.gen_task(task_dict[task], task)

使用fedavg在这些任务上用同组参数训练：
通过flgo.init函数设置不同任务上的运行器（runner）并启动

import flgo.algorithm.fedavg as fedavg
option = {'gpu':0, 'num_rounds':50, 'num_epochs':1, 'learning_rate':0.1, 'batch_size':64, 'eval_interval':10}
runners = [flgo.init(task, fedavg, option) for task in task_dict]
for runner in runners:
    runner.run()

数据异质性对联邦算法fedavg的影响：

import flgo
import flgo.benchmark.mnist_classification as mnist
import flgo.algorithm.fedavg as fedavg
import os
import flgo.experiment.analyzer as al
import matplotlib.pyplot as plt

"""
设定不同任务的数据集
benchmark指定数据集名称
partitioner指定数据划分器
"""
config_iid = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'IIDPartitioner','para':{'num_clients':100}}}
config_div01 = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'DiversityPartitioner','para':{'num_clients':100, 'diversity':0.1}}}
config_dir01 = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'DirichletPartitioner','para':{'num_clients':100, 'alpha':0.1}}}
# 数据集划分的任务字典
task_dict = {
    '/kaggle/working/mnist_iid': config_iid,
    '/kaggle/working/mnist_div01': config_div01,
    '/kaggle/working/mnist_dir01': config_dir01,
}
# 生成不同任务的任务路径
for task in task_dict:
    if not os.path.exists(task):
        flgo.gen_task(task_dict[task], task)

option = {'gpu':0, 'num_rounds':50, 'num_epochs':1}
runners = [flgo.init(task, fedavg, option) for task in task_dict]
for runner in runners:
    runner.run()

# 使用flgo.experiment.analyzer.Selector读取实验结果，然后通过matplotlib绘制图像
div_recs = al.Selector({'task': [t for t in task_dict],'header': ['fedavg']})

plt.subplot(121)
for task in div_recs.tasks:
    rec_list = div_recs.records[task]
    for rec in rec_list:
        plt.plot(rec.data['communication_round'],rec.data['test_loss'],label=task.split('/')[-1])
plt.title('valid loss on mnist')
plt.xlabel('communication_round')
plt.ylabel('test_loss')
plt.legend()

plt.subplot(122)
for task in div_recs.tasks:
    rec_list = div_recs.records[task]
    for rec in rec_list:
        plt.plot(rec.data['communication_round'],rec.data['test_accuracy'],label=task.split('/')[-1])
plt.title('valid accuracy on mnist')
plt.xlabel('communication_round')
plt.ylabel('test_accurasy')
plt.legend()

plt.show()

实验结果：
在这里插入图片描述
通过IID类型、Dirversity类型、Dirichlet类型的数据实验结果的对比发现，随着数据异质性的增强，fedavg在给定轮数下的准确率有所降低，初步反映了数据异质性对联邦学习的负面影响。因此，许多算法被提出用来提升各种non-IID数据下联邦算法的收敛速度和模型精度。