0% found this document useful (0 votes)
4 views15 pages

6 - Uncertainty-Aware Aggregation For Federated Open Set Domain Adaptation

The document presents a novel federated open set domain adaptation (FOSDA) algorithm that addresses the challenges of recognizing known and unknown classes in a target domain without sharing data among clients due to privacy concerns. FOSDA employs an uncertainty-aware mechanism to enhance the global model by focusing on contributions from clients with high uncertainty while retaining consistency, and it implements a federated class-based weighted strategy to maintain category information. Experimental results on benchmark datasets demonstrate the effectiveness of FOSDA in improving recognition accuracy for unknown target categories.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views15 pages

6 - Uncertainty-Aware Aggregation For Federated Open Set Domain Adaptation

The document presents a novel federated open set domain adaptation (FOSDA) algorithm that addresses the challenges of recognizing known and unknown classes in a target domain without sharing data among clients due to privacy concerns. FOSDA employs an uncertainty-aware mechanism to enhance the global model by focusing on contributions from clients with high uncertainty while retaining consistency, and it implements a federated class-based weighted strategy to maintain category information. Experimental results on benchmark datasets demonstrate the effectiveness of FOSDA in improving recognition accuracy for unknown target categories.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

7548 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO.

6, JUNE 2024

Uncertainty-Aware Aggregation for Federated Open


Set Domain Adaptation
Zixuan Qin, Liu Yang , Member, IEEE, Fei Gao, Qinghua Hu , Senior Member, IEEE, and Chenyang Shen

Abstract— Open set domain adaptation (OSDA) methods have the recognition performance in the unlabeled target domain.
been proposed to leverage the difference between the source and Domain adaptation methods have been applied to solve small
target domains, as well as to recognize the known and unknown sample problems in various scenarios [7], [8], [9], [10]. In gen-
classes in the target domain. Such methods typically require
the entire source and target data simultaneously to train the eral, domain adaptation only considers a situation in which
target model. However, in real scenarios, data are distributed the source domain has the same classes as the target domain.
and stored in various clients. They cannot be exchanged among However, in reality, new categories are often encountered in
clients because of privacy protection. Federated learning (FL) is the target domain that is not observed in the source domain.
a decentralized approach for training an effective global model Thus, the environment of these models changes from the
with the training data distributed among the clients. Despite its
potential in addressing the privacy concerns of data sharing, previous closed set to an open set. Simultaneously, these
FL methods for OSDA that can handle unknown classes is unknown categories interfere with the transfer effect.
not yet available. To tackle this problem, we have developed To solve this problem, several open set domain adaptation
a novel federated OSDA (FOSDA) algorithm. More specifically, (OSDA) methods [11], [12], [13], [14], [15], [16] have been
FOSDA adopts an uncertainty-aware mechanism to generate a proposed to deal with the problem of new classes. The goal of
global model from all client models. It reduces the uncertainty
of the federated aggregation by focusing on the contribution OSDA methods is to identify the data of the common classes
of source clients with high uncertainty while retaining those between the source and target domains in the target domain
with high consistency. Moreover, a federated class-based weighted and to reject the data of all unknown categories in the target
strategy is also implemented in FOSDA to maintain the category domain. The existing methods benefit from shared sources and
information of the source clients. We have conducted compre- target data. However, in real-application scenarios, the data of
hensive experiments on three benchmark datasets to evaluate the
performance of the proposed method, and the results demonstrate the source and target domains cannot be shared, which leads
the effectiveness of FOSDA. to the failure of domain-adaptive methods.
In several recent source-free works [17], [18], [19], [20],
Index Terms— Federated learning (FL), open set domain adap-
tation (OSDA), uncertainty-aware mechanism, weighted strategy. the model was trained without the domains accessing the
source and target data from one another. The model was
first trained with the source domain and then further trained
I. I NTRODUCTION
with the target domain. Thus, the target domain would not

I N RECENT years, deep neural network methods have


achieved substantial success in many fields, such as com-
puter vision tasks. However, a large amount of labeled data is
obtain the source data, thereby ensuring data security for
both parties to a certain extent. However, these works only
focused on the domain adaptation setting. Subsequently, two
required to train an effective deep neural network. In practical models [21], [22] were extended to solve OSDA problems.
applications, the collection of large-scale annotation data is In most applications, the source and target data are stored
time consuming and expensive. Domain adaptation [1], [2], in various clients, as illustrated in Fig. 1. The aforemen-
[3], [4], [5], [6] can be used to transfer knowledge from tioned methods only transfer information from one domain
related source domains with abundant labeled data to improve to another, and cannot aggregate information from all clients.
Manuscript received 6 March 2022; revised 15 July 2022 and It is difficult for these methods to combine the data that are
3 August 2022; accepted 4 October 2022. Date of publication 28 October distributed to the clients for a unified model with the original
2022; date of current version 4 June 2024. This work was supported in part frameworks.
by the National Key Research and Development Program of China under
Grant 2019YFB2101901; in part by the National Natural Science Foundation Federated learning (FL) [23], [24], [25], [26] is a suitable
of China under Grant 62076179, Grant 61732011, and Grant 61925602; and strategy for satisfying the distributed training and privacy
in part by the Haihe Laboratory of ITAI under Grant 22HHXCJC00002. protection requirements [27], [28]. It has been widely applied
(Corresponding author: Liu Yang.)
Zixuan Qin, Liu Yang, Fei Gao, and Qinghua Hu are with the College of in various areas, such as recommendation systems, medical
Intelligence and Computing, the Engineering Research Center of City intel- diagnosis, and autonomous driving [29]. FL methods [23] have
ligence and Digital Governance, and the Tianjin Key Laboratory of Machine been proven to be robust in many situations and can guarantee
Learning, Ministry of Education, Tianjin University, Tianjin 300350, China
(e-mail: [email protected]; [email protected]; [email protected]; model convergence. Subsequently, certain works [30], [31],
[email protected]). [32], [33] demonstrated that naive averaging can affect the
Chenyang Shen is with the Division of Medical Physics and Engineering, federated performance, particularly when the data are highly
Department of Radiation Oncology, University of Texas Southwestern Medical
Center, Dallas, TX 75390 USA (e-mail: [email protected]). skewed in a non-IID setting. Moreover, federated adversarial
Digital Object Identifier 10.1109/TNNLS.2022.3214930 domain adaptation (FADA) [34] has been proposed to solve
2162-237X © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
QIN et al.: UNCERTAINTY-AWARE AGGREGATION FOR FEDERATED OPEN SET DOMAIN ADAPTATION 7549

with high consistency. Furthermore, we adopt a federated


class-based weighted strategy to retain category information
while aggregating all client updates to deal with the different
categories of source clients.
The main contributions of this work can be summarized as
follows:
1) This is the first time that OSDA problems are solved in
a federated setting. In particular, FOSDA is proposed,
which can unite all clients to train a global model to
solve OSDA problems, while the training data remains
local under privacy protection.
2) The target domain client adopts an uncertainty-aware
mechanism to generate a global model from all local
models in the federated aggregation stage. This reduces
Fig. 1. Illustration of OSDA in a real-application scenario without data the uncertainty of the federated aggregation by focusing
sharing. Suppose that N source domain clients and one target domain client
exist. Owing to privacy protection, the clients cannot share data with one
on the contribution of the source clients with high uncer-
another. In addition to the categories in the source domain clients, the target tainty while maintaining those with high consistency.
domain client has its own private categories. For example, the first source Moreover, a federated class-based weighted strategy is
domain client has samples of “alarm clock,” “bike,” and “chair,” whereas
the final source domain client has samples of “chair” and “fork.” However,
employed to maintain the category information of the
the target domain client has a collection of source domain client categories, source clients.
as well as unknown categories, such as “telephone,” “flower,” and “pen.” 3) Various experiments on three datasets demonstrate that
FOSDA can effectively retain the importance of each
multisource domain adaptation tasks by considering domain client, and the recognition accuracy of the unknown
adaptation in an FL system. Following FADA, knowledge target categories is improved significantly.
distillation [knowledge distillation-based decentralized domain The remainder of this paper is organized as follows. First,
adaptation (KD3A)], which is based on decentralized domain we briefly review the related work in Section II. Section III
adaptation [35], has also been used to deal with multisource presents an overview of the proposed method and describes the
domain adaptation tasks in an FL system. However, in these training procedure in detail. Section IV presents a comparison
federated domain adaptation methods (FADA and KD3A), the of the performance of FOSDA and state-of-the-art methods.
categories of the target domain must be known in the source Finally, the conclusions are outlined in Section V.
domains. These methods cannot handle new categories in the
target domain that do not appear in the source domain, such II. R ELATED W ORK
as “telephone,” “flower,” and “pen” in Fig. 1. That is, they
OSDA methods have been proposed to leverage recognition
cannot solve OSDA problems.
of the known and unknown classes in the target domain. How-
In this study, we aim to solve the problem of OSDA in a
ever, it is difficult to implement OSDA methods in practical
wilder and more realistic environment which is a federated
applications where data are distributed to various clients and
setting. We assume that all of the source and target domain
cannot be exchanged because of privacy protections. FL has
data are stored in different clients and that private data cannot
been proposed to train an effective global model in which the
be shared among clients. We propose the federated OSDA
training data remain distributed among the clients. Although
(FOSDA) method, which effectively combines the distributed
FL solves the need for privacy protection, there is currently
data of the source domains to train a global model for the
no FL algorithm available in the literature that enables OSDA
OSDA task in the target domain client. One challenge is the
to deal with unknown classes. In this work, federated OSDA
recognition of the unknown classes in the target domain using
(FOSDA) is proposed to accomplish OSDA tasks without the
models that are trained by source clients and without source
need for data sharing among local clients so that privacy can
data. We propose the training of each source local model for
be protected. In this section, we first review OSDA methods
OSDA with the aid of the target domain client. As illustrated in
and then introduce FL methods.
Fig. 2, the local model updating is a distributed training period
in which clients from different source domains train their own
models. Each local client model can solve the OSDA problem A. OSDA
by setting a pseudoboundary between the known and unknown Domain adaptation [1] methods can transfer the information
classes with an adversarial network. of a labeled domain to an unlabeled domain. The key issue
Another important challenge is the evaluation of the con- in domain adaptation is the reduction of domain discrepancy.
tribution of each client owing to the uncertainty in the source The theoretical boundary for domain adaptation was analyzed
clients during federated aggregation. We introduce a federated in [2], which also revealed that the source risk and domain
uncertainty-aware strategy to learn the contribution of each discrepancy are critical to domain adaptation problems. A deep
client update automatically. To reduce the uncertainty of the adaptation network (DAN) [3] was proposed to minimize the
federated aggregation, we suggest focusing on the contribution distances of the feature distributions of different domains by
of the source clients with high uncertainty and retaining those adding an adaptation layer to a deep network. This method

Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
7550 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 6, JUNE 2024

Fig. 2. Framework of proposed FOSDA, which includes four main behaviors: downloading the global model from the target domain client to the source domain
clients, local model updating, uploading client models from the source domain clients to the target domain client, and federated aggregation. D 1 , . . . , D n
corresponds to source domain clients 1 to n. Furthermore, D T denotes the target domain client.

uses the feature transformation capabilities of deep neural unknown classes in advance; thus, several methods have been
networks to match the feature spaces. The residual transfer proposed to improve this situation. Furthermore, [15] used
network [4] applies a shortcut connection and entropy min- a multiclassifier structure to learn this threshold for each
imization criterion to the DAN. With the widespread use of sample automatically. Xu et al. [16] introduced the entropy
generated adversarial models, the domain adversarial neural of probability distributions to set a soft threshold for rejecting
network [5] was developed. This network uses adversarial unknown samples. A comparison of the ablation experiments
learning to train a domain classifier so that the classifier of these studies demonstrates that OSBP played the most
is suitable for the target domain, which is trained using important role. However, in most applications, it is difficult
labeled source domain data. Although conditional adversar- for these methods to share data in different domains to train
ial domain adaptation [6] also implies adversarial learning, a unified model.
it uses the tensor product between the feature representation Source-free unsupervised domain adaptation (SFUDA) has
and classifier prediction to improve the classifier recognition. been proposed to deal with the scenario in which the source
Recently, domain adaptation methods have been applied to domain data do not exist in the process of target adaptation.
solve small sample industrial problems. A generalized transfer SFUDA achieves privacy protection by isolating the source
framework [7] with evolutionary capability has been used for and target data. The unlabeled target data cannot be used to
fault diagnosis. Several domain adaptation networks [8], [9], fine-tune the pretrained source model. Model adaptation [17]
[10] have been proposed to align feature dimensions and match uses a class-conditional GAN to generate target-style data,
feature distributions in small sample scenarios. and the GAN and pretrained source model are trained
Furthermore, in realistic settings, new categories will be cooperatively. However, the training process consumes a large
encountered in the target domain, and it is necessary to respond number of computing resources. Several class prototype-based
sufficiently to these new categories, which is an OSDA task. methods have been proposed for training the models. In source
The assign and transform iterative method [11] uses metric data-free domain adaptation [18], reliable target samples are
learning to assign unknown samples iteratively, by means of selected as class prototypes, and pseudolabels are assigned
which the distances between the target samples and the center to the samples based on the similarities between the samples
of the categories are calculated. However, in this setting, it is and class prototypes. BAIT [19] trains the classifiers to obtain
necessary to collect several private categories in the source the source and target prototypes, and pulls the target features
domain to represent the new categories in the target domain. toward the prototypes so that the target data are aligned with
The open set backpropagation (OSBP) method [12] overcomes the source classifier. In contractual prototype generation and
the limitation of having corresponding new categories in the adaptation [20], class prototypes are generated for the source
source domain. A gradient reverse layer is established for the class via contrastive learning, and pseudolabels are used to
generator to maximize the classifier error. The generator can align the target data with the class prototypes. However, these
reject samples as unknown or align them with the source data. methods only focus on the domain adaptation problem and
Separate to adapt, [13] uses a coarse-to-fine weight mechanism they cannot deal with the unknown categories in open set
to separate unknown samples from the target domain and sub- scenarios.
sequently align the target and source domains. The -open set Source hypothesis transfer (SHOT) [21] retains the source
difference [14] was introduced based on the theoretical bound domain hypothesis and applies the semi-supervised pseudola-
of the OSDA. These works set a hard threshold to identify bel strategy during target adaptation. It can be extended to
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
QIN et al.: UNCERTAINTY-AWARE AGGREGATION FOR FEDERATED OPEN SET DOMAIN ADAPTATION 7551

handle new categories without sharing the source data. The uncertainty-aware learning algorithm [37] was proposed to
inheritable vendor–client (IVC) paradigm [22] can also solve improve FedAvg in the context of EHR. In this approach,
OSDA problems. It generates negative samples as out-of- the clients’ local data are used as a test set to estimate the
distribution with the aid of source data. Subsequently, the federated model performance, and hence it is not suitable for
IVC trains the source model with the source data and neg- domain adaptation scenarios. The distance between the client
ative samples together, and pseudolabeling is applied during parameters, such as the Euclidean distance, was computed
the target adaptation. However, these methods only transfer in [31], and more correlated parameters were selected for
information from one client to another and they are not suitable model integration. To a certain extent, it can be confirmed that
for aggregating information from all clients. the averaging strategy may not be an optimal approach. The
contribution of each client offers unexpected benefits in FL.
The Shapley value [32] is used to measure the contribution of
B. Federated Learning different clients to FL. Model-contrastive FL (MOON) [33]
With the development of terminal devices, data are being was proposed to resolve the heterogeneity of the local data
distributed and stored in a large number of private clients. distribution across parties further. The key concept of MOON
Owing to privacy and security policies, clients cannot share is to use the similarity between the model representations to
data with one another. Therefore, it is necessary to train a correct the local training of the clients. In the local training,
model with clients in a distributed manner instead of directly MOON corrects the updating direction by introducing MOON
pooling the data of all clients together. FL [23], [24] is an loss, the objective of which is to reduce the distance between
approach that enables multiple clients to aggregate for a global the current local and global models, as well as to increase
model, and it meets the requirements of privacy and security, the distance between the current and previous local mod-
where each client can only update the model using local data. els, thereby reducing the impact of the non-IID distribution.
FL has been widely applied in various areas because of its Although MOON takes into account the problem of non-IID
ability to provide privacy protection [29]. Google established distribution, it does not consider domain adaptation.
an FL network among mobile users to improve the qual- FL has been applied to domain adaptation tasks in several
ity of keyboard input prediction and promote the development studies. FADA [34] was proposed to solve multisource domain
of the recommendation system [36]. The federated diagnosis adaptation tasks in an FL system. This method provides a
of electronic health records [37] is used in different hospitals detailed analysis of the convergence of different client updates.
by reducing the contribution of models with high uncertainty. KD3A [35] performs domain adaptation through knowledge
Federated transfer reinforcement learning [38] is designed distillation on models from different source domains under
for autonomous driving, and it can transfer the knowledge of a privacy-preserving policy. Specifically, it designs a new
agents in real time. An FL framework [39] is used to jointly domain to obtain consistent knowledge from every source
train optimized credit card models in order to increase the domain and adds this domain to the final model aggregation
accuracy of customer risk determination. process. However, these methods cannot handle OSDA in
A research issue in FL is the determination of a set of a federated setting. Moreover, they only use naive averag-
suitable client models and the design of aggregation strategies. ing of the model parameters, which affects the federated
Averaging is a classic strategy that combines all of the performance, especially for non-IID data. We consider the
models from local clients. FedAvg [23] is an effective FL aggregation of all clients based on uncertainty to train an
strategy, by means of which all client models in the federated OSDA model for the target client without data sharing.
aggregation stage are weighted according to the amount of
data that is local to the client. All of the clients are at an equal III. M ETHODOLOGY
level. During the training process of FedAvg, a certain number In this section, we present the proposed FOSDA framework.
of clients are selected to attend aggregation in every iteration, As illustrated in Fig. 2, the entire FOSDA process is divided
and the number of local iterations of different clients is the into four stages: downloading a global model from the target
same. Communication-mitigated FL learning (CMFL) [24] is a domain client to the source domain clients, updating the
generalization of FedAvg that requires relevant client updates local model with the source domain client data, uploading
to be uploaded to participate in the global model aggregation the client local models and related information to the target
and ignores irrelevant client updates. Relevant client updates domain client, and implementing federated aggregation with
are identified by calculating the percentage of client updates the corresponding uncertainty-aware strategy. The local model
with different signs compared with their counterparts in the updating and federated aggregation stages are the most critical
global update. If the percentage is greater than a certain stages of the FOSDA. First, we define the OSDA problem in
threshold, the client is considered as a relevant client update. a federated environment. Thereafter, the local model updating
However, it has been proven that this naive averaging of and federated aggregation stages in the FOSDA are described
model parameters can affect the federated performance [30], in detail.
particularly when the data follow a highly skewed non-IID
distribution. A data-quality-based scheduling algorithm with
uncertainty measures [40] has been proposed to prioritize A. Problem Statement
reliable devices with rich and diverse datasets, but it still needs FOSDA is proposed to unite scattered source domain clients
to be evaluated on a publicly available dataset. The federated to train a global model for OSDA tasks on target domain

Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
7552 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 6, JUNE 2024

TABLE I
S UMMARY OF N OTATIONS

Fig. 3. Model updating. The source domain client uploads a generator G s


and classifier C s to the target domain, and a gradient flip layer is added to
complete the adversarial learning process.
clients, in which the client data always remain local. Let
D S = {Ds }s=1N
be the N source domain clients, where Ds = and updated by the target domain client model. To complete
s ns
{(xi , yi )}i=1 is the set of n s -labeled source examples that
s
the OSDA task, C s outputs a (K + 1)-dimensional vector,
are sampled from a certain distribution ps . xis is a labeled where K is the number of shared class labels and the final
source example and yis is the corresponding label. Further- dimension of the vector represents the probability that the
more, we have n t unlabeled examples D T = {xtj }nj t=1 that are sample belongs to the unknown class.
drawn from pt in the target domain client. The goal is to The model is trained using the labeled dataset Ds =
predict the labels { ŷ tj }nj t=1 of the unlabeled examples. The data {(xis , yis )}ni=1
s
for each source domain client to reduce the
distributions vary owing to the differences among the source classification error of the source samples. The source loss L s ,
clients; that is, psv = psu for sv = su . Moreover, distribution which is composed of the cross-entropy loss ce , is determined
shifting exists in the source and target; that is, ps = pt . as follows:
The source and target domainclass label sets are denoted by
1 
ns
L S and LT , respectively. L S = s=1 N
Ls is the set of K known        
L s θGs , θCs = ce C s G s xis ; θGs ; θCs , yis (1)
categories from the source domains, where Ls is the label set n s i=1
of the sth client, and the number of labels of Ls may be smaller
than K . Furthermore, Lsv and Lsu (sv = su ) may have different where θGs and θCs are the sth source client model generator and
categories. As illustrated in Fig. 1, apart from the categories classifier parameters, respectively.
that are shared with the source domain, several unobserved After updating the generator and classifier of the source
categories may exist in the target domain. LT = LT \L S domain client, the client transfers the generator and classifier
represents the private labels of the target domain, which are parameters to the target domain client. Subsequently, following
also referred to as the unknown category. Finally, the goal the OSBP method [12], the target domain client uses its
of FOSDA is to aggregate the source clients D S to learn a own unlabeled data D T = {xtj }nj t=1 to obtain the boundary
(K + 1)-dimensional classifier for the target domain, where between the known categories and unknown class using a
1 ∼ K indicates the known categories and K + 1 indicates the binary cross-entropy and a gradient flip layer between the
unknown category. Subsequently, the classifier is effectively generator and classifier
used to identify unlabeled data in the target domain D T . The
1   s s t s  s  t
t
  n

primary notations are summarized in Table I. L sadv θGs , θCs = t ce C G x j ; θG , θC , y j |K +1 . (2)
n j =1
B. Model Updating for OSDA This is a cross-entropy loss that measures the sample
One of the most critical stages of FOSDA is the client model belonging to the (K + 1)th class with the sth source client
updating, which is concerned with the use of the source and model. y tj |K +1 is the output of the classifier. The threshold for
target domain clients to solve OSDA tasks. The source domain the classifier is set to 0.5, which is a good boundary between
clients operate in parallel during this process. Every source known and unknown classes according to [12]. If y tj |K +1 is
domain client must be united with the target domain client lower than 0.5, this means that the sample is aligned with
to accomplish the entire model training process. Specifically, the source. If it is larger than 0.5, the sample is identified as
a pseudodecision boundary is trained between the known and belonging to an unknown class.
unknown classes using an adversarial network. During the The entire training process is a min–max game with a
process of the primary methods, the model must access the gradient flip layer between the generator and classifier. The
source and target data to train one model simultaneously. classifier C s is trained to establish a boundary between the
However, this condition cannot be satisfied in our setting, known and unknown classes. However, the generator G s is
where the source and target data are distributed and stored in trained to deceive C s with the gradient flip layer. During
different clients. Therefore, only the model parameters from adversarial training, G s can opt to increase or decrease the
the source domain clients are transferred to the target domain probability of maximizing the classifier error, which means
client. As indicated in Fig. 3, a source domain client model that G s can align the target samples with the source domain,
includes a generator G s and a classifier C s , which are uploaded and it can also treat a sample xtj as an unknown sample

Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
QIN et al.: UNCERTAINTY-AWARE AGGREGATION FOR FEDERATED OPEN SET DOMAIN ADAPTATION 7553

that will be rejected. With reference to the above-mentioned


analysis, the objectives of the generator and classifier in the
target domain client are
 
min −L sadv θGs (3)
 
min L sadv θCs . (4)
Equations (1) and (3) can be optimized using backpropaga-
tion. A saddle point can be identified as a stationary point of
the following stochastic updates:
 s τ +1  τ  
θ̂G ← θGs − η1 ∇ L s θGs Fig. 4. Conceptual illustration of the clustering results with different features.
 s τ +1  τ +1  
θG ← θ̂Gs + η2 ∇ L sadv θGs (5) The features on the top left are the target features that are extracted by the
global model in the final iteration and the features on the top right are target
features that are extracted by the average model. The features in the lower
where η1 is the source domain learning rate, η2 is the target left corner, lower center, and lower right corner are the target features that are
domain learning rate, and τ is the iteration number. (θ̂Gs )τ +1 extracted by client models 1, 2, and 3, respectively. The values in the boxes
is transferred to the target client and the updated (θGs )τ +1 is are the gap statistics of the corresponding features, diversity-based weights,
and consistency-based weights.
retained by the target domain client for federated aggregation.
Similarly, problems (1) and (4) can be solved to update θCs
A conceptual illustration of the clustering results is presented
 s τ +1  τ  
θ̂C ← θCs − η1 ∇ L s θCs in Fig. 4, where the target features become more compact
 s τ +1  τ +1   after updating the global model with client 1, which means
θC ← θ̂Cs − η2 ∇ L sadv θCs . (6) that the gap statistics of these features are smaller. However,
Following the target domain client updates, the model updat- after updating the global model with client 2, the distances
ing is completed. Thereafter, all of the models are aggregated between the features remain almost unchanged. Therefore, the
on the target client for further federated aggregation. contribution of client 1 to processing the target task is greater
than that of client 2.
In the τ th round of federated aggregation, the gap statistics
C. Uncertainty-Aware Federated Aggregation
of the target features using the sth client model, which is
Fig. 2 depicts the stage in which the local updates are trained with its own local data, are determined as follows:
aggregated for a global model. Considering the contribution of
each source domain client model to the target domain and the R
1      
(μs )τ =  fs τ − fs τ  (7)
differences in the category information of the source domain a b 2
r=1
2nr a,b∈C
r
clients, an uncertainty-aware strategy is proposed to replace
the simple averaging process. where {Cr }r=1
R
denotes the set of R clusters that are clustered
1) Federated Self-Weighted Strategy: In general, the data by k-means, fsa and fsb are two target features belonging to
distributions of the source domain clients D S = {Ds }s=1
N
differ cluster Cr that is extracted by the sth client model, and
 s τ s τ
significantly. In this case, different source domain clients will a,b∈Cr (fa ) − (fb ) 2 measures the sum of intraclass dis-
have very different convergence rates, and the importance of tances in class Cr .
the different clients varies in each aggregation. It is crucial The gap statistics of the target features with the (τ − 1)th
to measure the contribution of each source domain client to global model are
the target domain task to avoid negative transfers from invalid
R
1  
source domain clients. A self-weighted strategy is proposed to (μ̃) τ −1
= f̃ τ −1 − f̃ τ −1  (8)
a b 2
address this problem. By applying this strategy, local source r=1
2nr a,b∈C
r
domain client models that are useful for the final task will
have a stronger effect and those that are unfavorable for the where f̃ τ −1 denotes the target features that are extracted by
final task will be restricted. the (τ − 1)th global model.
a) Diversity-based weights: The value of the gap statis- Therefore, the contribution of the sth client is
tics [41] is used to demonstrate the ability of one client for (w̃s )τ = |μ̃τ −1 − (μs )τ |. (9)
measuring the uncertainty of the clients. Prior to calculating
the gap statistics, the k-means clustering algorithm is used to b) Consistency-based weights: The value of w̃s will be
cluster the features f that is extracted by a specific model, higher if the effect of a client model is more obvious. However,
following which the gap statistics can be recorded as the sum it is difficult to determine whether this change will benefit
of the intraclass distances. the final model. As illustrated in Fig. 4, after updating the
In the OSDA problem, the value of the gap statistics of global model with clients 1 and 3, the gaps between the
the target features can be used to reflect the capability of gap statistics of the global model and clients 1 and 3 are
the model on the target domain client. Furthermore, the gap both large. However, client 1 plays a significant role, whereas
between the initial global model and the client model, which client 3 plays a negative role in the target tasks. To avoid this
is updated by the local private data, can be used to measure situation, according to the principle that most clients make
the contribution of the client to the federated aggregation. positive contributions to the final model, we consider another

Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
7554 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 6, JUNE 2024

factor in addition to w̄s . This factor means that if the model federated self-weighted strategy. For example, in Fig. 1, the
deviates from all client models, we believe that the model will probabilities that a sample belongs to “alarm clock,” “bike,”
have adverse effects and assign it a small value. Specifically, “chair,” “fork,” and “unknown” constitute the output of the
we calculate an average model of the client models following final fully connected layer and each class corresponds to a
the local model updating to represent the performance of neuron. As the first client has the labels “alarm clock,” “bike,”
almost all clients. Thus, the gap statistics of the τ th average and “chair,” the model will be more sensitive to data with these
model are determined by labels. The related parameters in the final fully connected
layer are reserved, which is formalized as sign(θCs k ) = 1
R
1  
(μ̄)τ = f̄ τ − f̄ τ  . (10) (k = 1, 2, 3). However, as the first client has no labeled data
a b 2
r=1
2nr a,b∈C with “fork,” the related parameters in the final fully connected
r
layer will be discarded, which is formalized as sign(θCs k ) = 0
The difference between the gap statistics of the client model (k = 4). Because each client model is beneficial for identifying
and those of the average model can reveal the status of the an unknown class, the relevant parameters of the client should
local client be preserved. The final fully connected layer can be
(w̄s )τ = |μ̄τ − (μs )τ |. (11) aggregated as

c) Joint weights: For simplicity and convenience, 


N
 
θ̃Ck = ws sign θCs k θCs k , k = 1, . . . , K + 1. (17)
we omit the symbol τ of the weights in this section. A larger s=1
w̃s indicates a greater contribution of the model, but w̄s
restricts all models to a normal range that expresses the The federated self-weighted and federated class-based
opposite trend. To achieve this effect, the weight is first weighted strategies are very flexible and selectable.
preprocessed 3) Optimization: The proposed FOSDA model is trained in
an end-to-end fashion using stochastic gradient descent [42].
w̃s − min(w̃s ) The model updating for OSDA and uncertainty federated
w̃s = (12)
max(w̃s ) − min(w̃s ) aggregation are minimized together with the task loss. The
max(w̄s ) − w̄s overall FOSDA process is described in Algorithm 1, and the
w̄s = (13)
max(w̄s ) − min(w̄s ) flowchart is shown in Fig. 5. After training, the trained global
where min and max denote the minimum and maximum generator parameters θG and global classifier parameters θC
functions, respectively. The softmax function is subsequently can be used to predict the labels of the target data.
used for normalization.
The final weight is D. Generalization Bound for FOSDA
A hypothesis is a function h with an error with respect to the
ws = α w̃s + (1 − α)w̄s (14)
ground-truth labeling function. The risk and empirical risk of a
where α denotes a hyperparameter. hypothesis on D S are denoted as  S (h) and   S (h), respectively.
Therefore, the combined global model of the generator is Similarly, the risk and empirical risk on D T are denoted as
as follows: T (h) and 
T (h). Moreover, T (h K +1 ) is the empirical risk of

N h on the unknown categories.
θ̃G = ws θGs . (15) Let H be a hypothesis class with V C-dimension d, and let
s=1
S and D
D T be empirical distributions induced by a sample of
size m from the source domain and the target domain in an FL
Except for the final layer, the combined global model of the S , D
T ) denotes the divergence
system, respectively. dHH (D
classifier is as follows:
between the source and target domains. In addition, λ is the

N
combined source and target risk of the optimal hypothesis.
θ̃C = ws θCs . (16) Then, with a probability of at least 1 − δ over the choice of
s=1
samples, for each h ∈ H, we have
2) Federated Class-Based Weighted Strategy: As illustrated
in Fig. 1, different source domain clients will have different 2d log(2m) + log(4/δ)
T (h) ≤ 
 S (h) + 4
numbers and types of categories. If the category information m
1 S , D
T ) + λ. (18)
is considered during the process of aggregating the models, +T (h K +1 ) +  d H H (D
the final global model will benefit significantly. Therefore, 2
in our setting, the category information of the source domain The bound in (18) is extended from the error bound in [34]
clients must be uploaded to the target domain client. Because for domain adaptation, which considers the open set empirical
only the categories that are owned by each source domain risk. In the federated system, the target model cannot directly
are uploaded and the original data are not shared, no privacy get access to data stored on different source clients for privacy
protection issues will be involved. The parameters of the final protection. To address this issue, separated models are learned
fully connected layer can best represent the discriminative for each distributed source domain h S = {h s }s=1N
, where h s is
category information for a neural network. While aggregating a hypothesis for the sth source domain. Consider a combined
the final fully connected layer, the category weights should source domain which is equivalent to a mixture distribution
be adopted in addition to the weights that are involved in the of the S source domains, with the mixture weight ws ∈ R+
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
QIN et al.: UNCERTAINTY-AWARE AGGREGATION FOR FEDERATED OPEN SET DOMAIN ADAPTATION 7555

Fig. 5. Flowchart of FOSDA.

Algorithm 1 FOSDA of size Nm, where N is the number of sources and m


is the batch size of samples. Then, the bound in (18)
 clients D = {D }s=1 , where
S s N
Input : N source  sdomain
s ns becomes
D = xi , yi i=1 , target domain client
s
n 
D T = xtj j t=1 , source domain learning rates η1 and  N
2d log(2Nm) + log(4/δ)
target domain learning rates η2 , number of source T (h T ) ≤ 
S ws h s + 4
domain training epochs E s and number of target s=1
Nm
domain training epochs E t , number of global epochs   1
Eg. + T h TK +1 +  d H H (DS , D
T ) + λ. (19)
Output: Trained global generator parameters θ̃G , global 2
classifier parameters θ̃C . Following [34], the upper bound of dH H (D S , D
T ) can
Initialize source domain client models: N generator parameters 
be derived as dH H (D , D ) ≤ S T N s  s T
θG1 , θG2 , . . . θGN , N classifier parameters θC1 , θC2 , . . . θCN . s=1 w dH H (D , D ).
for τ = 1 : E g do Similarly, with the triangle inequality property, we have λ ≤
 N  S T
s=1 w λ . Replacing dH H (D , D ) and λ in (19), we can
s s
Download the global model to the source domain clients.
Model updating: derive
for s = 1 : N in parallel do 
for l = 1 : E s do  N
2d log(2Nm) + log(4/δ)
Compute the cross-entropy loss on the source domain T (h ) ≤
T
S w h +4
s s
Nm
client using Eq. (1). s=1
end N    N
1 s , D
T ) +
end +T (h TK +1 )+ ws  d H H (D ws λs .
Upload the client models to the target domain client.
s=1
2 s=1
for s = 1 : N in parallel do
for l = 1 : E t do
(20)
Compute the binary cross-entropy loss on the target
domain client using Eq. (2);
Then, the weighted error bound for feder-
Update θGs and θCs using Eqs. (5) and (6). ated domain adaptation and open-set empirical
end risk is
end 

N
2d log(2Nm) + log(4/δ)
Federated aggregation: T (h ) ≤ 
T
S whs s
+4
Compute the gap between the gap statistics of the initial Nm
global model and those of the client model w̃s using Eq. s=1   
   VC dimension constraint
(9);
Compute the difference between the gap statistics
source risk
⎛ ⎞
⎜1
N
T ) +λs ⎟
of the client model and those of the average
model w̄s using Eq. (11); +
T (h T ) + ws ⎝ dH H (D
s , D ⎠.
Normalize the weights using Eqs. (12) and (13);  K +1 s=1 2  
open set risk (D s ,D T ) divergence
Compute the final weight ws using Eq. (14);
Update θ̃G by weighting θG1 , θG2 , . . . , θGN using (21)
Eq. (15);
Update θ̃C using Eq. (16) for the front layers and IV. E XPERIMENTS
Eq. (17) for the final fully connected layer.
end In this section, we describe extensive experiments that were
Return θ̃G , θ̃C . conducted on three standard benchmark datasets to evaluate
the effectiveness of the FOSDA.
N
s=1 w = 1. Then, the target hypothesis
s T
and  Nh iss sthe
aggregation of the parameters of h , i.e., h := s=1 w h .
s T A. Datasets
The mini-batch optimization method is usually employed The dataset and data allocation for each client in this
in deep learning. Let S be a mixture of source samples experiment are presented in Table II.
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
7556 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 6, JUNE 2024

TABLE II
D ESCRIPTION OF D ATASETS

Digits [43] consists of three domains: modified NIST of numbers 0 and 1, whereas client 2 could only have two
(MNIST) (M) [44], Street View House Numbers (SVHN) categories of numbers 3 and 4. PyTorch was used to run the
(S), and United States Postal Service (USPS) (U ) [45]. Each experiment along with a stochastic gradient descent optimizer
domain corresponds to ten categories. Following the previous with a learning rate of 0.0001. The batch size was set to 32.
work [12], we selected 0–4 as the known classes and regarded The number of communication rounds was set to 20 for digits
5–9 as unknown classes to satisfy the open set requirements. and 50 for Office-31/Office–Home.
The transfer tasks included S → M, M → U , and U → M.
For the U → M task, U as the source domain had only C. Compared Methods
145 samples, whereas M had 60 000 samples. The number of We conducted comparative experiments from two perspec-
samples in the source and target domains was severely unbal- tives: 1) the FL methods FedAvg [23], CMFL [24], and
anced. Moreover, under the federated experimental setting, the MOON [33], and weighted FedAvg methods based on the
samples of U would be divided among multiple clients, which Euclidean distance (FedEuc) [31] and cosine similarity (Fed-
would cause the imbalance to be more obvious and affect the Cos) and 2) the three domain adaption methods SHOT [21],
overall performance. Therefore, in the following experiments, IVC [22], and KD3A [35], which could be extended to solve
we selected 10% of the M samples by category as a new the federated OSDA problem. None of the compared methods,
dataset to reduce the imbalance. except for SHOT and IVC, have the ability to deal with
Office-31 [46] is a standard domain adaptation dataset unknown categories. Therefore, we followed [13] and used an
that contains three domains: Amazon ( A), Dslr (D), and automatic confidence threshold to determine whether a sample
Webcam (W ). The features of Office-31 were extracted using belongs to an unknown class in the testing stage.
ResNet50, which was applied in [47], and the dimensions of 1) FedAvg [23] is a prominent classical averaging FL
these features were 2048. For the OSDA tasks, we followed method. All client models are weighted according to
a previous work [12] to select the first ten categories in their own amount of data in the federated aggregation
alphabetical order as the known classes and categories 21–31 stage.
as the unknown classes. 2) CMFL [24] is an FL method that simply considers
Office–Home [48] is frequently used for OSDA tasks. relevant client updates. Prior to the federated aggregation
In the following experiments, we used the features that stage, it verifies how well the client update aligns with
were extracted using ResNet50, which was applied in [47]. the global update, and it selects relevant client updates
In comparison with Office-31, the OSDA tasks were more to participate in the federated aggregation.
challenging in Office–Home. Artistic (A), clipart (C), product 3) MOON [33] reduces the impact of the data distribution
(P), and real-world (R) constitute Office–Home, and the by contrasting the distances between the current local
number of categories in this dataset is 65. We selected the model and global model as well as the previous local
first 25 categories as the known classes and the remainder as model on each client.
the unknown classes. 4) FedEuc [31] and FedCos are two weighted variants
of FedAvg based on the Euclidean distance or cosine
B. Experimental Setting
similarity between the client model and the average
In this study, we applied FOSDA to solve OSDA problems weighted model. A smaller distance or greater similarity
in a federated setting. All source domain clients were sampled results in greater weight.
from one source domain and the target domain was considered 5) SHOT [21] is an SFUDA method that retains the source
an independent client. Specifically, there were a total of domain hypothesis and uses the pseudolabel strategy
50 source clients for the tasks on Digits. A total of 30 clients during the target adaptation without exposing the source
were randomly selected during the federated aggregation of domain data.
the FOSDA. Owing to the small number of samples and 6) IVC [22] is an inheritable vendor–client paradigm,
a large number of classes, we placed source data on ten which was proposed to transfer the model information
clients for Office-31 and Office–Home and allowed all clients from one client to another without exposing the source
to participate in the aggregation in each round. To verify domain data.
the effectiveness of the FOSDA further, experiments were 7) KD3A [35] performs domain adaptation through knowl-
conducted considering two scenarios: IID and non-IID. IID edge distillation on models from different source
was a scenario in which the source data were shuffled and domains under a privacy-preserving policy.
then divided equally into the source domain clients; the data
categories owned by the clients were approximately the same.
In the non-IID setting, the aim was for each client to have D. Evaluation Metrics
its own private categories; thus, we applied the strategy used Following a previous work [12], two evaluation metrics
in [23]. For example, client 1 could only have two categories were used: the average accuracy among all classes (OS)
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
QIN et al.: UNCERTAINTY-AWARE AGGREGATION FOR FEDERATED OPEN SET DOMAIN ADAPTATION 7557

TABLE III KD3A by 0.5% and 0.8% for OS and OS*, respectively. The
C LASSIFICATION A CCURACY (%) OF OSDA TASKS ON D IGITS improvements were more obvious for the Office-31 non-IID
tasks: FOSDA surpassed KD3A by 4.2% and 2.0% on OS
and OS*, respectively.
The results for Office–Home are presented in Table V.
The tasks were more difficult in Office–Home; there were
25 known and 40 unknown classes, and the number of cat-
egories was increased. However, the FOSDA could maintain
relatively good performance. In the IID case, FOSDA achieved
the best results for OS and OS*. In particular, FOSDA
achieved better performance than the other methods for nine
out of 12 transfer tasks. In the non-IID case, FOSDA continued
to perform well, with an improvement of at least 2.0% on OS
and 2.0% on OS* compared with the other methods.

F. Analysis
1) Domain Discrepancy Between Clients: The A-distance
is a measure of the domain discrepancy that was used in [3].
The A-distance is expressed as d̂A = 2(1−2), where  is the
generalization error of a two-sample classifier (e.g., the kernel
and the average accuracy among known classes (OS*). The SVM) that is trained on the binary problem of distinguishing
accuracy of the unknown classes (UNK) was also considered the input samples between the source and target domains.
to reveal the unknown classes The A-distance was used to verify the difficulty of the task
K  
 ∗ 1  xt ∈ DkT ∧ ŷk = yk  between the source and target domain clients. The W → A
Acc OS =  T (22) and W → D of Office-31, and P → C and R → P of Office–
K k=1 D 
k
Home were selected as examples. The A-distance between the
K +1  
1  xt ∈ DkT ∧ ŷk = yk  source and target domain clients as well as the corresponding
Acc(OS) =  T (23)
K + 1 k=1 D  accuracy are depicted in Fig. 6. Fig. 6(a) and (b) presents the
k
 t  results of Office-31 in the IID and non-IID cases, respectively.
x ∈ D T ∧ ŷ K +1 = y K +1 
Acc(UNK) = K +1
  (24) All but one W → D source client had a smaller A-distance.
x t ∈ D T  Correspondingly, FOSDA could obtain a higher OS and OS*
K +1
on W → D than on W → A. R → P was easier for Office–
where ŷ is the predicted result and DkT is the set of target
Home, with a smaller Adistance. FOSDA obtained a higher
samples with the label yk .
OS and OS* on R → P than on P → C. It can be observed
from Fig. 6 that the transfer task was easier, with a smaller A
E. Results distance, and the accuracy was generally higher for this task.
The results of Digits are summarized in Table III. We con- In contrast, a task with a larger A-distance is more challenging.
ducted comparative experiments on two scenarios with three 2) Ability to Solve Open Set Problems: To demonstrate the
tasks: S → M, U → M, and M → U . In the IID ability of FOSDA to recognize unknown classes, the recog-
case, FOSDA outperformed other FL methods in the different nition accuracy of unknown classes is discussed separately.
evaluation metrics. FOSDA achieved 76.5% on OS and 87.5% As illustrated in Fig. 7(a)–(c), for all of the IID cases, the
on OS*, which indicated an improvement of at least 7.0% on accuracy of FOSDA on the unknown classes was higher than
OS and 6.2% on OS* compared with the other methods. The that of the other methods. In particular, in the Office–Home
tasks in the non-IID case were more difficult than those in A → P IID task, FOSDA achieved 60.2% on UNK, which
the IID case. FOSDA could maintain its effectiveness, with was an improvement of at least 45.7% compared with the
a performance that was 3.5% and 3.2% higher on OS and other methods. Moreover, FOSDA outperformed the other
OS*, respectively, than SHOT, which achieved suboptimal methods by at least 4.4% on OS for all datasets. In com-
performance. The source domain for digits was divided into parison with the other models, FOSDA trained an adversarial
50 clients and the time complexity of KD3A was excessively network to identify unknown classes for OSDA tasks in the
high, as discussed later. client model updating stage. Therefore, FOSDA improved the
The results of Office-31 are listed in Table IV. A total of recognition accuracy of the unknown classes while ensuring
eight methods were used for the comparative experiments. the average accuracy, which was the same for the non-IID
KD3A was included and good performance was achieved. cases. Fig. 7(d)–(f) shows that in the non-IID cases, FOSDA
Similar to FOSDA, it considers the data information of still exhibited better performance compared with the other
the target domain client during the training process, but methods. In particular, UNK of FOSDA on Office–Home was
this is achieved through knowledge voting from the source at least 17.8% higher than that of the other methods.
domain clients to the target domain client. The OS and 3) Comparison of Different Weighted Strategies: The feder-
OS* of the Office-31 IID tasks performed by KD3A were ated self-weighted and federated class-based weighted strate-
90.5% and 92.8%, respectively. However, FOSDA surpassed gies are model aggregation strategies that are based on the
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
7558 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 6, JUNE 2024

TABLE IV
C LASSIFICATION A CCURACY (%) OF OSDA TASKS ON O FFICE -31 (R ES N ET-50)

TABLE V
OS (%) OF OSDA TASKS ON O FFICE –H OME (R ES N ET-50)

TABLE VI
C LASSIFICATION A CCURACY (%) OF D IFFERENT W EIGHTED S TRATEGIES ON O FFICE -31 TASK FOR NON -IID

characteristics of the target domain client data. To highlight the model aggregation and ignored the task information in the
the advantages of these two strategies, we made adjustments to target domain. However, FOSDA indirectly used the statistical
FedEuc and FedCos, which are weighted federated methods, characteristics of the target domain client data as a standard to
and renamed them FedEuc* and FedCos*, respectively. The measure the clients in the model aggregation, which enabled
client models in FedEuc and FedCos were replaced with those the final model to adapt to the task of the target domain
in FOSDA, and FedEuc* and FedCos* had different weighted client more effectively. FedEuc* and FedCos* essentially
strategies compared with FOSDA. Considering the tasks on optimized the model toward the goal of an average model
Office-31 for non-IID as examples, as indicated in Table VI, as far as possible, ignoring the contribution of each client
FOSDA achieved the best results, followed by FedCos*, based model in every aggregation round. This could be guaranteed
on the cosine similarity. FedEuc* and FedCos* directly deter- using (9) in a federated self-weighted strategy that focuses on
mined the distance between the client models as a measure of the contribution of each client. Furthermore, in the non-IID

Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
QIN et al.: UNCERTAINTY-AWARE AGGREGATION FOR FEDERATED OPEN SET DOMAIN ADAPTATION 7559

Fig. 6. A-distance and corresponding accuracy on Office-31 W → A and W → D tasks in both IID and non-IID cases, and Office–Home P → C and
R → P tasks in both IID and non-IID cases. (a) Office-31 IID tasks. (b) Office-31 non-IID tasks. (c) Office–Home IID tasks. (d) Office–Home non-IID
tasks.

Fig. 7. Average accuracy among all classes (OS) and accuracy of the unknown classes (UNK) on the Digits M → U tasks in both the IID and non-IID
cases, on the Office-31 W → A tasks in both the IID and non-IID cases, and on the Office–Home A → P tasks in both the IID and non-IID cases.
(a) Digits M → U IID task. (b) Office-31 W → A IID task. (c) Office–Home A → P IID task. (d) Digits M → U non-IID task, (e) Office-31 W → A
non-IID task. (f) Office–Home A → P non-IID task.

scenario, the data classes on the source domain clients were two items, w̃s and w̄s . To express this more clearly, the
extremely imbalanced and the classification capabilities of federated self-weighted strategy is referred to as the FCWS.
each client for every class differed significantly. The federated Fully considering the above-mentioned terms, the ablation
class-based weighted strategy in FOSDA caused each source experimental settings were as follows: 1) w̃s ; 2) w̄s ; 3) FCWS;
client to pay more attention to its ability to classify private 4) w̃s + FCWS; 5) w̄s + FCWS; and 6) FOSDA. The final
class samples. FOSDA outperformed FedEuc* and FedCos* experiment considered the combination of w̃s , w̄s , and FCWS,
by 5.2% and 6.9% on OS and 5.0% and 6.4% on OS*, none of which was ignored. Furthermore, α in (14) was 0.7 for
respectively. Thus, it is clear that after adjusting the local this task, whereas α was a parameter to balance w̃s and w̄s .
model, the performance of the federated model was improved The results of the ablation experiments are listed in
significantly. Table VII. Among w̃s , w̄s , and FCWS, the OS and OS*
4) Ablation Study: We conducted a series of ablation exper- of the first were higher than those of the latter two, which
iments on the W → A task in the non-IID case to verify the demonstrates the importance of measuring the contribution of
importance of each term in the federated self-weighted and each client in this task. Comparing w̄s and w̄s + FCWS, it is
federated class-based weighted strategies. As in the previous clear that FCWS had a negative effect. However, comparing w̃s
introduction, the federated self-weighted strategy includes and w̃s + FCWS, the advantage of FCWS is evident. The final

Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
7560 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 6, JUNE 2024

Fig. 8. Comparison of the results of different methods. For Digits, the U → M task was selected, for Office-31, the W → A task was selected, and for
Office–Home, the C → R task was selected. (a) Digits U → M IID task. (b) Office-31 W → A IID task. (c) Office–Home C → R IID task. (d) Digits
U → M non-IID task. (e) Office-31 W → A non-IID task. (f) Office–Home C → R IID task.

TABLE VII which resulted in large fluctuations in the accuracy during the
A BLATION S TUDY ON D IGITS W → A TASK FOR NON -IID training process. In addition, KD3A [35] was also unstable.
KD3A was not designed for the open set problem, and
it directly discards the ambiguous classification results and
focuses on the categories with higher confidence. Under the
experimental scenario of OSDA, new categories appear in the
target domain. This increase in ambiguous classification results
led to instability in KD3A. This situation also existed in the
non-IID case.
TABLE VIII
C LASSIFICATION A CCURACY (%) OF OSDA TASKS ON D IGITS
6) Parameter Impact Analysis: The proposed model incor-
porates the federated self-weighted strategy, and the parameter
α is used for balancing w̃s and w̄s in (14). In the experiments,
α was selected to range from 0.1 to 0.9. It can be observed
from Fig. 9 that values of parameter α of 0.1, 0.3, and
0.3 could achieve the best performance for the U → M,
W → A, and C → R tasks in the IID scenarios, respectively.
The data distribution of each client was essentially the same;
therefore, when training the model, it should be in one
FOSDA achieved the best performance using the combination direction, which means that w̄s is more important than w̃s .
of w̃s , w̄s , and FCWS, with 77.2% on OS and 78.8% on OS*. Moreover, for the U → M, W → A, and C → R tasks in
In our method, all three items were selected. The importance the non-IID case, the corresponding best performances were
of each item varies for different tasks. We emphasize that achieved with α values of 0.9, 0.7, and 0.9, respectively. Thus,
different tasks must have their own uniqueness. in the non-IID scenarios, more attention should be paid to the
5) Model Convergence Analysis: To demonstrate the con- contributions of clients during the aggregation process, as w̃s
vergence and advantages of FOSDA simultaneously, six tasks plays an important role.
on three datasets were selected for experiments in the IID and 7) Additional Discussion on KD3A: KD3A needs to cal-
non-IID cases, as illustrated in Fig. 8. In the IID case, FOSDA culate the total consensus quality for each coalition of source
outperformed the other methods on all three tasks and could domain clients. The complexity of this process is equivalent to
converge efficiently with the global iterations. Fig. 8(d)–(f) the sum of the N source domain clients following the combina-
shows that the FOSDA method achieved the highest accuracy tion: O(N N/2 ). Therefore, a considerable amount of time was
in the respective tasks on all three datasets, and the model required to federate 50 clients on Digits. We made changes to
could converge effectively. KD3A to compare it with FOSDA. In this comparison process,
In contrast, the FedAvg algorithm was less stable because we set only ten source domain clients to participate in each
it did not consider the contributions of individual clients in aggregation. The results are presented in Table VIII. In the
the aggregation process or the uncertainty of the aggregation, IID scenarios, FOSDA was 4.0% higher than KD3A on OS

Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
QIN et al.: UNCERTAINTY-AWARE AGGREGATION FOR FEDERATED OPEN SET DOMAIN ADAPTATION 7561

Fig. 9. Comparison of results with different α. For Digits, the U → M task was selected, for Office-31, the W → A task was selected, and for Office–Home,
the C → R task was selected. (a) Digits U → M task. (b) Office-31 W → A task. (c) Office–Home C → R task.

and 3.5% higher on OS*. In the non-IID scenarios, FOSDA [5] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by
was 5.1% higher than KD3A on OS and 5.6% higher on backpropagation,” in Proc. 32nd Int. Conf. Mach. Learn., 2015,
pp. 1180–1189.
OS*. FOSDA achieved better performance than KD3A for all [6] M. Long, Z. Cao, J. Wang, and M. I. Jordan, “Conditional adversarial
non-IID tasks. domain adaptation,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2018,
pp. 1647–1657.
V. C ONCLUSION [7] J. Liu and Y. Ren, “A general transfer framework based on industrial
process fault diagnosis under small samples,” IEEE Trans. Ind. Infor-
In this study, we proposed a novel FL approach to accom- mat., vol. 17, no. 9, pp. 6073–6083, Sep. 2021.
plish OSDA tasks without the need for data sharing among [8] Y. Ren, J. Liu, Q. Wang, and H. Zhang, “HSELL-Net: A heterogeneous
sample enhancement network with lifelong learning under industrial
local clients in order to protect privacy. A federated self- small samples,” IEEE Trans. Cybern., early access, Mar. 22, 2022, doi:
weighted strategy based on diversity and consistency was 10.1109/TCYB.2022.3158697.
proposed to adaptively aggregate the models from different [9] Y. Ren, J. Liu, H. Zhang, and J. Wang, “TBDA-Net: A task-based bias
domain adaptation network under industrial small samples,” IEEE Trans.
sources based on their importance to the task. Moreover, Ind. Informat., vol. 18, no. 9, pp. 6109–6119, Sep. 2022.
a federated class-based weighted strategy was implemented [10] Y. Ren, J. Liu, Y. Chen, and W. Wang, “LJDA-Net: A low-rank joint
to selectively focus on the neurons corresponding to those domain adaptation network for industrial sample enhancement,” IEEE
Sensors J., vol. 22, no. 12, pp. 11881–11891, Jun. 2022.
categories that are available on a local client. These strategies
[11] P. P. Busto and J. Gall, “Open set domain adaptation,” in Proc. IEEE
can flexibly select reliable sources, which is an effective Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 754–763.
strategy, especially for non-IID tasks. The experimental results [12] K. Saito, S. Yamamoto, Y. Ushiku, and T. Harada, “Open set domain
demonstrated that the proposed model could achieve superior adaptation by backpropagation,” in Proc. Eur. Conf. Comput. Vis., 2018,
pp. 153–168.
performance compared with state-of-the-art FL methods. The [13] H. Liu, Z. Cao, M. Long, J. Wang, and Q. Yang, “Separate to
proposed uncertainty-aware aggregation method is suitable for adapt: Open set domain adaptation via progressive separation,” in Proc.
classification or recognition tasks where data with different IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,
pp. 2927–2936.
distributions are scattered among multiple clients. A large [14] L. Zhong, Z. Fang, F. Liu, B. Yuan, G. Zhang, and J. Lu, “Bridging the
number of real-world applications, such as automatic driving, theoretical bound and deep algorithms for open set domain adaptation,”
medical image processing, and financial data analysis, fall into 2020, arXiv:2006.13022.
this category and are expected to benefit from the FL approach. [15] T. Shermin, G. Lu, S. W. Teng, M. Murshed, and F. Sohel, “Adversarial
network with multiple classifiers for open set domain adaptation,” IEEE
The proposed FOSDA framework also has several limita- Trans. Multimedia, vol. 23, pp. 2732–2744, 2021.
tions. First, it only focuses on the task in the target domain. [16] Y. Xu, L. Chen, L. Duan, I. W. Tsang, and J. Luo, “Open
A potential future task would be to consider the perspective set domain adaptation with soft unknown-class rejection,” IEEE
Trans. Neural Netw. Learn. Syst., early access, Aug. 30, 2021, doi:
of the personalized federation so that the trained clients can 10.1109/TNNLS.2021.3105614.
retain as much personalized information as possible to improve [17] R. Li, Q. Jiao, W. Cao, H.-S. Wong, and S. Wu, “Model adapta-
their abilities while achieving the domain adaptation task. tion: Unsupervised domain adaptation without source data,” in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
Second, in the experimental setup of this study, the data of pp. 9641–9650.
multiple source–domain clients were sampled from the same [18] Y. Kim, D. Cho, K. Han, P. Panda, and S. Hong, “Domain adap-
domain, which might not hold in real-world applications. tation without source data,” IEEE Trans. Artif. Intell., vol. 2, no. 6,
pp. 508–518, Dec. 2021.
Further evaluation of the proposed framework in a more
[19] S. Yang, Y. Wang, J. van de Weijer, L. Herranz, and S. Jui, “Casting
realistic setting will be needed in the future. a BAIT for offline and online source-free domain adaptation,” 2020,
arXiv:2010.12427.
R EFERENCES [20] Z. Qiu et al., “Source-free domain adaptation via avatar prototype
[1] J. Lu, V. Behbood, P. Hao, H. Zuo, S. Xue, and G. Zhang, “Transfer generation and adaptation,” 2021, arXiv:2106.15326.
learning using computational intelligence: A survey,” Knowl.-Based [21] J. Liang, D. Hu, and J. Feng, “Do we really need to access the source
Syst., vol. 80, pp. 14–23, May 2015. data? Source hypothesis transfer for unsupervised domain adaptation,”
[2] S. Ben-David et al., “Analysis of representations for domain adaptation,” in Proc. 37th Int. Conf. Mach. Learn., 2020, pp. 6028–6039.
in Proc. Adv. Neural Inf. Process. Syst., vol. 19, 2007, pp. 137–144. [22] J. Nath Kundu, N. Venkat, A. Revanur, M. V. Rahul, and R. Venkatesh
[3] M. Long, Y. Cao, J. Wang, and M. Jordan, “Learning transferable Babu, “Towards inheritable models for open-set domain adaptation,”
features with deep adaptation networks,” in Proc. 32nd Int. Conf. Mach. in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
Learn., 2015, pp. 97–105. Jun. 2020, pp. 12376–12385.
[4] M. Long, H. Zhu, J. Wang, and M. I. Jordan, “Unsupervised domain [23] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas,
adaptation with residual transfer networks,” in Proc. Int. Conf. Neural “Communication-efficient learning of deep networks from decentralized
Inf. Process. Syst., 2016, pp. 136–144. data,” in Proc. Int. Conf. Artif. Intell. Statist., 2017, pp. 1273–1282.

Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
7562 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 6, JUNE 2024

[24] L. Wang, W. Wang, and B. Li, “CMFL: Mitigating communication Zixuan Qin is currently pursuing the master’s
overhead for federated learning,” in Proc. IEEE 39th Int. Conf. Distrib. degree with the College of Intelligence and Com-
Comput. Syst. (ICDCS), Jul. 2019, pp. 954–964. puting, Tianjin University, Tianjin, China.
[25] K. Burlachenko, S. Horváth, and P. Richtárik, “FL_PyTorch: Optimiza- His research interests include federated learning.
tion research simulator for federated learning,” 2022, arXiv:2202.03099.
[26] A. Eslami Abyane, D. Zhu, R. Medeiros de Souza, L. Ma, and
H. Hemmati, “Towards understanding quality challenges of the fed-
erated learning: A first look from the lens of robustness,” 2022,
arXiv:2201.01409.
[27] K. Bonawitz et al., “Practical secure aggregation for privacy-preserving
machine learning,” in Proc. ACM SIGSAC Conf. Comput. Commun.
Secur., 2017, pp. 1175–1191.
[28] P. Mohassel and Y. Zhang, “SecureML: A system for scalable privacy-
preserving machine learning,” in Proc. IEEE Symp. Secur. Privacy (SP),
May 2017, pp. 19–38. Liu Yang (Member, IEEE) received the Ph.D.
[29] C. Zhang, Y. Xie, H. Bai, B. Yu, W. Li, and Y. Gao, “A sur- degree in computer science from the School of Com-
vey on federated learning,” Knowl.-Based Syst., vol. 216, Mar. 2021, puter and Information Technology, Beijing Jiaotong
Art. no. 106775. University, Beijing, China, in 2016.
[30] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, “Federated She is currently an Associate Professor with the
learning with non-IID data,” 2018, arXiv:1806.00582. College of Intelligence and Computing, Tianjin Uni-
[31] P. Xiao, S. Cheng, V. Stankovic, and D. Vukobratovic, “Averaging is versity, Tianjin, China. Her research interests include
probably not the optimum way of aggregating parameters in federated data mining and machine learning.
learning,” Entropy, vol. 22, no. 3, p. 314, 2020.
[32] G. Wang, C. X. Dang, and Z. Zhou, “Measure contribution of partic-
ipants in federated learning,” in Proc. IEEE Int. Conf. Big Data (Big
Data), Dec. 2019, pp. 2597–2604.
[33] Q. Li, B. He, and D. Song, “Model-contrastive federated learning,”
in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
Jun. 2021, pp. 10708–10717. Fei Gao received the M.S. degree in computer
[34] X. Peng, Z. Huang, Y. Zhu, and K. Saenko, “Federated adversarial science from the College of Intelligence and Com-
domain adaptation,” 2019, arXiv:1911.02054. puting, Tianjin University, Tianjin, China, in 2022.
[35] H.-Z. Feng et al., “KD3A: Unsupervised multi-source decentralized Her research interests include transfer learning and
domain adaptation via knowledge distillation,” 2020, arXiv:2011.09757. federated learning.
[36] Y. Mansour, M. Mohri, J. Ro, and A. Theertha Suresh, “Three
approaches for personalization with applications to federated learning,”
2020, arXiv:2002.10619.
[37] S. Boughorbel, F. Jarray, N. Venugopal, S. Moosa, H. Elhadi, and
M. Makhlouf, “Federated uncertainty-aware learning for distributed
hospital EHR data,” 2019, arXiv:1910.12191.
[38] X. Liang, Y. Liu, T. Chen, M. Liu, and Q. Yang, “Federated transfer rein-
forcement learning for autonomous driving,” 2019, arXiv:1910.06001.
[39] F. Zheng, K. Li, J. Tian, and X. Xiang, “A vertical federated learning
method for interpretable scorecard and its application in credit scoring,” Qinghua Hu (Senior Member, IEEE) received the
2020, arXiv:2009.06218. B.S., M.S., and Ph.D. degrees from Harbin Institute
[40] A. Taik, H. Moudoud, and S. Cherkaoui, “Data-quality based scheduling of Technology, Harbin, China, in 1999, 2002, and
for federated edge learning,” in Proc. IEEE 46th Conf. Local Comput. 2008, respectively.
Netw. (LCN), Oct. 2021, pp. 17–23. He was a Post-Doctoral Fellow with the Depart-
[41] R. Giancarlo, D. Scaturro, and F. Utro, “Computational cluster validation ment of Computing, The Hong Kong Polytech-
for microarray data analysis: Experimental assessment of clest, consen- nic University, Hong Kong, from 2009 to 2011.
sus clustering, figure of merit, gap statistics and model explorer,” BMC He is currently the Dean of the School of Artificial
Bioinf., vol. 9, no. 1, pp. 1–19, Dec. 2008. Intelligence, Tianjin, China, the Vice Chairman of
[42] J. Kiefer and J. Wolfowitz, “Stochastic estimation of the maximum of the Tianjin Branch of China Computer Federation,
a regression function,” Ann. Math. Statist., vol. 23, no. 3, pp. 462–466, Tianjin, and the Vice Director of the Special Interest
Sep. 1952. Group (SIG) Granular Computing and Knowledge Discovery, Tianjin. He is
[43] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, currently supported by the Key Program, National Natural Science Foundation
“Reading digits in natural images with unsupervised feature learning,” in of China. He has authored or coauthored over 200 peer-reviewed papers. His
Proc. NIPS Workshop Deep Learn. Unsupervised Feature Learn., 2011, current research interests include uncertainty modeling in big data, machine
pp. 1–9. learning with multimodality data, and intelligent unmanned systems.
[44] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based Dr. Hu is an Associate Editor of the IEEE T RANSACTIONS ON F UZZY
learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, S YSTEMS , Acta Automatica Sinica, and Energies.
pp. 2278–2324, Nov. 1998.
[45] J. J. Hull, “A database for handwritten text recognition research,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 16, no. 5, pp. 550–554,
May 1994. Chenyang Shen received the B.Sc. degree from
[46] K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual category Yangzhou University, Yangzhou, China, in 2010, and
models to new domains,” in Proc. Eur. Conf. Comput. Vis., 2010, the M.Phil. and Ph.D. degrees from Hong Kong
pp. 213–226. Baptist University, Hong Kong, in 2012 and 2015,
[47] Q. Wang and T. Breckon, “Unsupervised domain adaptation via struc- respectively.
tured prediction based selective pseudo-labeling,” in Proc. AAAI Conf. He is currently with the Division of Medical
Artif. Intell., 2020, vol. 34, no. 4, pp. 6243–6250. Physics and Engineering, Department of Radiation
[48] H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan, Oncology, University of Texas Southwestern Med-
“Deep hashing network for unsupervised domain adaptation,” in ical Center, Dallas, TX, USA. His current research
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, interests include medical imaging, scientific comput-
pp. 5018–5027. ing, data mining, and deep learning.

Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.

You might also like