一、初始化
通过flgo.init()函数创建一个runner(运行器),调用run方法开启迭代训练。
flgo.init函数介绍
函数位置在/flgo/utils/fflow.py
中,函数定义为:
def init(task: str, algorithm, option = {}, model=None, Logger: flgo.experiment.logger.BasicLogger = None, Simulator: BasicSimulator=flgo.simulator.DefaultSimulator, scene='horizontal');
init接收的输入包括:
- task:联邦任务路径,由flgo.gen_task生成的任务路径;
- algorithm:联邦算法,要求algorithm的类型是class或module,横向联邦中需要其具备- algorithm.Server和algorithm.Client两个可访问的属性;
- option(可选):运行选项,类型为字典,包含运行时的各类参数;
- model(可选):待优化的模型模块,要求model的类型为类或module,横向联邦中需要其具备model.init_global_module和model.init_local_module两个可访问的方法,为实体初始化module;
- Logger(可选):日志记录器类,要求父类为flgo.experiment.logger.BasicLogger
- Simulator(可选):系统模拟器类,要求父类为flgo.system_simulator.BasicSimulator
- scene(可选):类型为字符串,联邦场景(e.g. 横向 or 纵向)
在flgo中,每个runner的运行参数由传入的字典option来指定。
下表列出参数的中文释义和用法:
flgo实现在MNIST数据集上进行fedavg算法训练实例:
import flgo
import flgo.benchmark.mnist_classification as mnist
import flgo.algorithm.fedavg as fedavg
import os
import flgo.experiment.analyzer as al
# 任务路径
task = '/kaggle/working/test_mnist'
# 数据集划分,将mnist数据集以IID形式划分为100个客户端,
config = {
'benchmark': {'name': 'flgo.benchmark.mnist_classification'},
'partitioner': {'name': 'IIDPartitioner', 'para': {'num_clients': 100}}
}
if not os.path.exists(task): flgo.gen_task(config,task_path = task)
# 绘制训练过程中的验证集损失和验证集准确率
analysis_plan = {
'Selector': {
'task': task,
'header': ['fedavg']
},
'Painter': {
'Curve': [
{'args': {'x': 'communication_round','y': 'val_loss'}, 'fig_option': {'title': 'valid loss on mnist'}},
{'args': {'x': 'communication_round','y': 'val_accuracy'},'fig_option': {'title': 'valid accuracy on mnist'}},
]
}
}
# 初始化一个runner,调用run方法开启迭代训练
# 训练轮次为5轮,本地训练1次
fedavg_runner = flgo.init(task,fedavg,{'num_rounds': 5, 'num_epochs': 1, 'gpu': 0})
fedavg_runner.run()
al.show(analysis_plan)
训练结果:
![]() |
![]() |
二、数据集
在FLGo中,一个数据集的一种划分将对应一个联邦任务。
生成不同数据划分的联邦任务:
import flgo
import flgo.benchmark.mnist_classification as mnist
import os
"""
benchmark指定数据集名称
partitioner指定数据划分器
"""
config_iid = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'IIDPartitioner','para':{'num_clients':100}}}
config_div01 = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'DiversityPartitioner','para':{'num_clients':100, 'diversity':0.1}}}
config_div05 = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'DiversityPartitioner','para':{'num_clients':100, 'diversity':0.5}}}
config_div09 = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'DiversityPartitioner','para':{'num_clients':100, 'diversity':0.9}}}
config_dir01 = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'DirichletPartitioner','para':{'num_clients':100, 'alpha':0.1}}}
config_dir10 = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'DirichletPartitioner','para':{'num_clients':100, 'alpha':1.0}}}
config_dir50 = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'DirichletPartitioner','para':{'num_clients':100, 'alpha':5.0}}}
# 数据集划分的任务字典
task_dict = {
'./mnist_iid': config_iid,
'./mnist_div01': config_div01,
'./mnist_div05': config_div05,
'./mnist_div09': config_div09,
'./mnist_dir01': config_dir01,
'./mnist_dir10': config_dir10,
'./mnist_dir50': config_dir50,
}
# 生成不同任务的任务路径
for task in task_dict:
if not os.path.exists(task):
flgo.gen_task(task_dict[task], task)
使用fedavg在这些任务上用同组参数训练:
通过flgo.init函数设置不同任务上的运行器(runner)并启动
import flgo.algorithm.fedavg as fedavg
option = {'gpu':0, 'num_rounds':50, 'num_epochs':1, 'learning_rate':0.1, 'batch_size':64, 'eval_interval':10}
runners = [flgo.init(task, fedavg, option) for task in task_dict]
for runner in runners:
runner.run()
数据异质性对联邦算法fedavg的影响:
import flgo
import flgo.benchmark.mnist_classification as mnist
import flgo.algorithm.fedavg as fedavg
import os
import flgo.experiment.analyzer as al
import matplotlib.pyplot as plt
"""
设定不同任务的数据集
benchmark指定数据集名称
partitioner指定数据划分器
"""
config_iid = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'IIDPartitioner','para':{'num_clients':100}}}
config_div01 = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'DiversityPartitioner','para':{'num_clients':100, 'diversity':0.1}}}
config_dir01 = {'benchmark':{'name':'flgo.benchmark.mnist_classification'},'partitioner':{'name': 'DirichletPartitioner','para':{'num_clients':100, 'alpha':0.1}}}
# 数据集划分的任务字典
task_dict = {
'/kaggle/working/mnist_iid': config_iid,
'/kaggle/working/mnist_div01': config_div01,
'/kaggle/working/mnist_dir01': config_dir01,
}
# 生成不同任务的任务路径
for task in task_dict:
if not os.path.exists(task):
flgo.gen_task(task_dict[task], task)
option = {'gpu':0, 'num_rounds':50, 'num_epochs':1}
runners = [flgo.init(task, fedavg, option) for task in task_dict]
for runner in runners:
runner.run()
# 使用flgo.experiment.analyzer.Selector读取实验结果,然后通过matplotlib绘制图像
div_recs = al.Selector({'task': [t for t in task_dict],'header': ['fedavg']})
plt.subplot(121)
for task in div_recs.tasks:
rec_list = div_recs.records[task]
for rec in rec_list:
plt.plot(rec.data['communication_round'],rec.data['test_loss'],label=task.split('/')[-1])
plt.title('valid loss on mnist')
plt.xlabel('communication_round')
plt.ylabel('test_loss')
plt.legend()
plt.subplot(122)
for task in div_recs.tasks:
rec_list = div_recs.records[task]
for rec in rec_list:
plt.plot(rec.data['communication_round'],rec.data['test_accuracy'],label=task.split('/')[-1])
plt.title('valid accuracy on mnist')
plt.xlabel('communication_round')
plt.ylabel('test_accurasy')
plt.legend()
plt.show()
实验结果:
通过IID类型、Dirversity类型、Dirichlet类型的数据实验结果的对比发现,随着数据异质性的增强,fedavg在给定轮数下的准确率有所降低,初步反映了数据异质性对联邦学习的负面影响。因此,许多算法被提出用来提升各种non-IID数据下联邦算法的收敛速度和模型精度。