6 - Uncertainty-Aware Aggregation For Federated Open Set Domain Adaptation
6 - Uncertainty-Aware Aggregation For Federated Open Set Domain Adaptation
6, JUNE 2024
Abstract— Open set domain adaptation (OSDA) methods have the recognition performance in the unlabeled target domain.
been proposed to leverage the difference between the source and Domain adaptation methods have been applied to solve small
target domains, as well as to recognize the known and unknown sample problems in various scenarios [7], [8], [9], [10]. In gen-
classes in the target domain. Such methods typically require
the entire source and target data simultaneously to train the eral, domain adaptation only considers a situation in which
target model. However, in real scenarios, data are distributed the source domain has the same classes as the target domain.
and stored in various clients. They cannot be exchanged among However, in reality, new categories are often encountered in
clients because of privacy protection. Federated learning (FL) is the target domain that is not observed in the source domain.
a decentralized approach for training an effective global model Thus, the environment of these models changes from the
with the training data distributed among the clients. Despite its
potential in addressing the privacy concerns of data sharing, previous closed set to an open set. Simultaneously, these
FL methods for OSDA that can handle unknown classes is unknown categories interfere with the transfer effect.
not yet available. To tackle this problem, we have developed To solve this problem, several open set domain adaptation
a novel federated OSDA (FOSDA) algorithm. More specifically, (OSDA) methods [11], [12], [13], [14], [15], [16] have been
FOSDA adopts an uncertainty-aware mechanism to generate a proposed to deal with the problem of new classes. The goal of
global model from all client models. It reduces the uncertainty
of the federated aggregation by focusing on the contribution OSDA methods is to identify the data of the common classes
of source clients with high uncertainty while retaining those between the source and target domains in the target domain
with high consistency. Moreover, a federated class-based weighted and to reject the data of all unknown categories in the target
strategy is also implemented in FOSDA to maintain the category domain. The existing methods benefit from shared sources and
information of the source clients. We have conducted compre- target data. However, in real-application scenarios, the data of
hensive experiments on three benchmark datasets to evaluate the
performance of the proposed method, and the results demonstrate the source and target domains cannot be shared, which leads
the effectiveness of FOSDA. to the failure of domain-adaptive methods.
In several recent source-free works [17], [18], [19], [20],
Index Terms— Federated learning (FL), open set domain adap-
tation (OSDA), uncertainty-aware mechanism, weighted strategy. the model was trained without the domains accessing the
source and target data from one another. The model was
first trained with the source domain and then further trained
I. I NTRODUCTION
with the target domain. Thus, the target domain would not
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
QIN et al.: UNCERTAINTY-AWARE AGGREGATION FOR FEDERATED OPEN SET DOMAIN ADAPTATION 7549
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
7550 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 6, JUNE 2024
Fig. 2. Framework of proposed FOSDA, which includes four main behaviors: downloading the global model from the target domain client to the source domain
clients, local model updating, uploading client models from the source domain clients to the target domain client, and federated aggregation. D 1 , . . . , D n
corresponds to source domain clients 1 to n. Furthermore, D T denotes the target domain client.
uses the feature transformation capabilities of deep neural unknown classes in advance; thus, several methods have been
networks to match the feature spaces. The residual transfer proposed to improve this situation. Furthermore, [15] used
network [4] applies a shortcut connection and entropy min- a multiclassifier structure to learn this threshold for each
imization criterion to the DAN. With the widespread use of sample automatically. Xu et al. [16] introduced the entropy
generated adversarial models, the domain adversarial neural of probability distributions to set a soft threshold for rejecting
network [5] was developed. This network uses adversarial unknown samples. A comparison of the ablation experiments
learning to train a domain classifier so that the classifier of these studies demonstrates that OSBP played the most
is suitable for the target domain, which is trained using important role. However, in most applications, it is difficult
labeled source domain data. Although conditional adversar- for these methods to share data in different domains to train
ial domain adaptation [6] also implies adversarial learning, a unified model.
it uses the tensor product between the feature representation Source-free unsupervised domain adaptation (SFUDA) has
and classifier prediction to improve the classifier recognition. been proposed to deal with the scenario in which the source
Recently, domain adaptation methods have been applied to domain data do not exist in the process of target adaptation.
solve small sample industrial problems. A generalized transfer SFUDA achieves privacy protection by isolating the source
framework [7] with evolutionary capability has been used for and target data. The unlabeled target data cannot be used to
fault diagnosis. Several domain adaptation networks [8], [9], fine-tune the pretrained source model. Model adaptation [17]
[10] have been proposed to align feature dimensions and match uses a class-conditional GAN to generate target-style data,
feature distributions in small sample scenarios. and the GAN and pretrained source model are trained
Furthermore, in realistic settings, new categories will be cooperatively. However, the training process consumes a large
encountered in the target domain, and it is necessary to respond number of computing resources. Several class prototype-based
sufficiently to these new categories, which is an OSDA task. methods have been proposed for training the models. In source
The assign and transform iterative method [11] uses metric data-free domain adaptation [18], reliable target samples are
learning to assign unknown samples iteratively, by means of selected as class prototypes, and pseudolabels are assigned
which the distances between the target samples and the center to the samples based on the similarities between the samples
of the categories are calculated. However, in this setting, it is and class prototypes. BAIT [19] trains the classifiers to obtain
necessary to collect several private categories in the source the source and target prototypes, and pulls the target features
domain to represent the new categories in the target domain. toward the prototypes so that the target data are aligned with
The open set backpropagation (OSBP) method [12] overcomes the source classifier. In contractual prototype generation and
the limitation of having corresponding new categories in the adaptation [20], class prototypes are generated for the source
source domain. A gradient reverse layer is established for the class via contrastive learning, and pseudolabels are used to
generator to maximize the classifier error. The generator can align the target data with the class prototypes. However, these
reject samples as unknown or align them with the source data. methods only focus on the domain adaptation problem and
Separate to adapt, [13] uses a coarse-to-fine weight mechanism they cannot deal with the unknown categories in open set
to separate unknown samples from the target domain and sub- scenarios.
sequently align the target and source domains. The -open set Source hypothesis transfer (SHOT) [21] retains the source
difference [14] was introduced based on the theoretical bound domain hypothesis and applies the semi-supervised pseudola-
of the OSDA. These works set a hard threshold to identify bel strategy during target adaptation. It can be extended to
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
QIN et al.: UNCERTAINTY-AWARE AGGREGATION FOR FEDERATED OPEN SET DOMAIN ADAPTATION 7551
handle new categories without sharing the source data. The uncertainty-aware learning algorithm [37] was proposed to
inheritable vendor–client (IVC) paradigm [22] can also solve improve FedAvg in the context of EHR. In this approach,
OSDA problems. It generates negative samples as out-of- the clients’ local data are used as a test set to estimate the
distribution with the aid of source data. Subsequently, the federated model performance, and hence it is not suitable for
IVC trains the source model with the source data and neg- domain adaptation scenarios. The distance between the client
ative samples together, and pseudolabeling is applied during parameters, such as the Euclidean distance, was computed
the target adaptation. However, these methods only transfer in [31], and more correlated parameters were selected for
information from one client to another and they are not suitable model integration. To a certain extent, it can be confirmed that
for aggregating information from all clients. the averaging strategy may not be an optimal approach. The
contribution of each client offers unexpected benefits in FL.
The Shapley value [32] is used to measure the contribution of
B. Federated Learning different clients to FL. Model-contrastive FL (MOON) [33]
With the development of terminal devices, data are being was proposed to resolve the heterogeneity of the local data
distributed and stored in a large number of private clients. distribution across parties further. The key concept of MOON
Owing to privacy and security policies, clients cannot share is to use the similarity between the model representations to
data with one another. Therefore, it is necessary to train a correct the local training of the clients. In the local training,
model with clients in a distributed manner instead of directly MOON corrects the updating direction by introducing MOON
pooling the data of all clients together. FL [23], [24] is an loss, the objective of which is to reduce the distance between
approach that enables multiple clients to aggregate for a global the current local and global models, as well as to increase
model, and it meets the requirements of privacy and security, the distance between the current and previous local mod-
where each client can only update the model using local data. els, thereby reducing the impact of the non-IID distribution.
FL has been widely applied in various areas because of its Although MOON takes into account the problem of non-IID
ability to provide privacy protection [29]. Google established distribution, it does not consider domain adaptation.
an FL network among mobile users to improve the qual- FL has been applied to domain adaptation tasks in several
ity of keyboard input prediction and promote the development studies. FADA [34] was proposed to solve multisource domain
of the recommendation system [36]. The federated diagnosis adaptation tasks in an FL system. This method provides a
of electronic health records [37] is used in different hospitals detailed analysis of the convergence of different client updates.
by reducing the contribution of models with high uncertainty. KD3A [35] performs domain adaptation through knowledge
Federated transfer reinforcement learning [38] is designed distillation on models from different source domains under
for autonomous driving, and it can transfer the knowledge of a privacy-preserving policy. Specifically, it designs a new
agents in real time. An FL framework [39] is used to jointly domain to obtain consistent knowledge from every source
train optimized credit card models in order to increase the domain and adds this domain to the final model aggregation
accuracy of customer risk determination. process. However, these methods cannot handle OSDA in
A research issue in FL is the determination of a set of a federated setting. Moreover, they only use naive averag-
suitable client models and the design of aggregation strategies. ing of the model parameters, which affects the federated
Averaging is a classic strategy that combines all of the performance, especially for non-IID data. We consider the
models from local clients. FedAvg [23] is an effective FL aggregation of all clients based on uncertainty to train an
strategy, by means of which all client models in the federated OSDA model for the target client without data sharing.
aggregation stage are weighted according to the amount of
data that is local to the client. All of the clients are at an equal III. M ETHODOLOGY
level. During the training process of FedAvg, a certain number In this section, we present the proposed FOSDA framework.
of clients are selected to attend aggregation in every iteration, As illustrated in Fig. 2, the entire FOSDA process is divided
and the number of local iterations of different clients is the into four stages: downloading a global model from the target
same. Communication-mitigated FL learning (CMFL) [24] is a domain client to the source domain clients, updating the
generalization of FedAvg that requires relevant client updates local model with the source domain client data, uploading
to be uploaded to participate in the global model aggregation the client local models and related information to the target
and ignores irrelevant client updates. Relevant client updates domain client, and implementing federated aggregation with
are identified by calculating the percentage of client updates the corresponding uncertainty-aware strategy. The local model
with different signs compared with their counterparts in the updating and federated aggregation stages are the most critical
global update. If the percentage is greater than a certain stages of the FOSDA. First, we define the OSDA problem in
threshold, the client is considered as a relevant client update. a federated environment. Thereafter, the local model updating
However, it has been proven that this naive averaging of and federated aggregation stages in the FOSDA are described
model parameters can affect the federated performance [30], in detail.
particularly when the data follow a highly skewed non-IID
distribution. A data-quality-based scheduling algorithm with
uncertainty measures [40] has been proposed to prioritize A. Problem Statement
reliable devices with rich and diverse datasets, but it still needs FOSDA is proposed to unite scattered source domain clients
to be evaluated on a publicly available dataset. The federated to train a global model for OSDA tasks on target domain
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
7552 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 6, JUNE 2024
TABLE I
S UMMARY OF N OTATIONS
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
QIN et al.: UNCERTAINTY-AWARE AGGREGATION FOR FEDERATED OPEN SET DOMAIN ADAPTATION 7553
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
7554 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 6, JUNE 2024
factor in addition to w̄s . This factor means that if the model federated self-weighted strategy. For example, in Fig. 1, the
deviates from all client models, we believe that the model will probabilities that a sample belongs to “alarm clock,” “bike,”
have adverse effects and assign it a small value. Specifically, “chair,” “fork,” and “unknown” constitute the output of the
we calculate an average model of the client models following final fully connected layer and each class corresponds to a
the local model updating to represent the performance of neuron. As the first client has the labels “alarm clock,” “bike,”
almost all clients. Thus, the gap statistics of the τ th average and “chair,” the model will be more sensitive to data with these
model are determined by labels. The related parameters in the final fully connected
layer are reserved, which is formalized as sign(θCs k ) = 1
R
1
(μ̄)τ = f̄ τ − f̄ τ . (10) (k = 1, 2, 3). However, as the first client has no labeled data
a b 2
r=1
2nr a,b∈C with “fork,” the related parameters in the final fully connected
r
layer will be discarded, which is formalized as sign(θCs k ) = 0
The difference between the gap statistics of the client model (k = 4). Because each client model is beneficial for identifying
and those of the average model can reveal the status of the an unknown class, the relevant parameters of the client should
local client be preserved. The final fully connected layer can be
(w̄s )τ = |μ̄τ − (μs )τ |. (11) aggregated as
TABLE II
D ESCRIPTION OF D ATASETS
Digits [43] consists of three domains: modified NIST of numbers 0 and 1, whereas client 2 could only have two
(MNIST) (M) [44], Street View House Numbers (SVHN) categories of numbers 3 and 4. PyTorch was used to run the
(S), and United States Postal Service (USPS) (U ) [45]. Each experiment along with a stochastic gradient descent optimizer
domain corresponds to ten categories. Following the previous with a learning rate of 0.0001. The batch size was set to 32.
work [12], we selected 0–4 as the known classes and regarded The number of communication rounds was set to 20 for digits
5–9 as unknown classes to satisfy the open set requirements. and 50 for Office-31/Office–Home.
The transfer tasks included S → M, M → U , and U → M.
For the U → M task, U as the source domain had only C. Compared Methods
145 samples, whereas M had 60 000 samples. The number of We conducted comparative experiments from two perspec-
samples in the source and target domains was severely unbal- tives: 1) the FL methods FedAvg [23], CMFL [24], and
anced. Moreover, under the federated experimental setting, the MOON [33], and weighted FedAvg methods based on the
samples of U would be divided among multiple clients, which Euclidean distance (FedEuc) [31] and cosine similarity (Fed-
would cause the imbalance to be more obvious and affect the Cos) and 2) the three domain adaption methods SHOT [21],
overall performance. Therefore, in the following experiments, IVC [22], and KD3A [35], which could be extended to solve
we selected 10% of the M samples by category as a new the federated OSDA problem. None of the compared methods,
dataset to reduce the imbalance. except for SHOT and IVC, have the ability to deal with
Office-31 [46] is a standard domain adaptation dataset unknown categories. Therefore, we followed [13] and used an
that contains three domains: Amazon ( A), Dslr (D), and automatic confidence threshold to determine whether a sample
Webcam (W ). The features of Office-31 were extracted using belongs to an unknown class in the testing stage.
ResNet50, which was applied in [47], and the dimensions of 1) FedAvg [23] is a prominent classical averaging FL
these features were 2048. For the OSDA tasks, we followed method. All client models are weighted according to
a previous work [12] to select the first ten categories in their own amount of data in the federated aggregation
alphabetical order as the known classes and categories 21–31 stage.
as the unknown classes. 2) CMFL [24] is an FL method that simply considers
Office–Home [48] is frequently used for OSDA tasks. relevant client updates. Prior to the federated aggregation
In the following experiments, we used the features that stage, it verifies how well the client update aligns with
were extracted using ResNet50, which was applied in [47]. the global update, and it selects relevant client updates
In comparison with Office-31, the OSDA tasks were more to participate in the federated aggregation.
challenging in Office–Home. Artistic (A), clipart (C), product 3) MOON [33] reduces the impact of the data distribution
(P), and real-world (R) constitute Office–Home, and the by contrasting the distances between the current local
number of categories in this dataset is 65. We selected the model and global model as well as the previous local
first 25 categories as the known classes and the remainder as model on each client.
the unknown classes. 4) FedEuc [31] and FedCos are two weighted variants
of FedAvg based on the Euclidean distance or cosine
B. Experimental Setting
similarity between the client model and the average
In this study, we applied FOSDA to solve OSDA problems weighted model. A smaller distance or greater similarity
in a federated setting. All source domain clients were sampled results in greater weight.
from one source domain and the target domain was considered 5) SHOT [21] is an SFUDA method that retains the source
an independent client. Specifically, there were a total of domain hypothesis and uses the pseudolabel strategy
50 source clients for the tasks on Digits. A total of 30 clients during the target adaptation without exposing the source
were randomly selected during the federated aggregation of domain data.
the FOSDA. Owing to the small number of samples and 6) IVC [22] is an inheritable vendor–client paradigm,
a large number of classes, we placed source data on ten which was proposed to transfer the model information
clients for Office-31 and Office–Home and allowed all clients from one client to another without exposing the source
to participate in the aggregation in each round. To verify domain data.
the effectiveness of the FOSDA further, experiments were 7) KD3A [35] performs domain adaptation through knowl-
conducted considering two scenarios: IID and non-IID. IID edge distillation on models from different source
was a scenario in which the source data were shuffled and domains under a privacy-preserving policy.
then divided equally into the source domain clients; the data
categories owned by the clients were approximately the same.
In the non-IID setting, the aim was for each client to have D. Evaluation Metrics
its own private categories; thus, we applied the strategy used Following a previous work [12], two evaluation metrics
in [23]. For example, client 1 could only have two categories were used: the average accuracy among all classes (OS)
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
QIN et al.: UNCERTAINTY-AWARE AGGREGATION FOR FEDERATED OPEN SET DOMAIN ADAPTATION 7557
TABLE III KD3A by 0.5% and 0.8% for OS and OS*, respectively. The
C LASSIFICATION A CCURACY (%) OF OSDA TASKS ON D IGITS improvements were more obvious for the Office-31 non-IID
tasks: FOSDA surpassed KD3A by 4.2% and 2.0% on OS
and OS*, respectively.
The results for Office–Home are presented in Table V.
The tasks were more difficult in Office–Home; there were
25 known and 40 unknown classes, and the number of cat-
egories was increased. However, the FOSDA could maintain
relatively good performance. In the IID case, FOSDA achieved
the best results for OS and OS*. In particular, FOSDA
achieved better performance than the other methods for nine
out of 12 transfer tasks. In the non-IID case, FOSDA continued
to perform well, with an improvement of at least 2.0% on OS
and 2.0% on OS* compared with the other methods.
F. Analysis
1) Domain Discrepancy Between Clients: The A-distance
is a measure of the domain discrepancy that was used in [3].
The A-distance is expressed as d̂A = 2(1−2), where is the
generalization error of a two-sample classifier (e.g., the kernel
and the average accuracy among known classes (OS*). The SVM) that is trained on the binary problem of distinguishing
accuracy of the unknown classes (UNK) was also considered the input samples between the source and target domains.
to reveal the unknown classes The A-distance was used to verify the difficulty of the task
K
∗ 1 xt ∈ DkT ∧ ŷk = yk between the source and target domain clients. The W → A
Acc OS = T (22) and W → D of Office-31, and P → C and R → P of Office–
K k=1 D
k
Home were selected as examples. The A-distance between the
K +1
1 xt ∈ DkT ∧ ŷk = yk source and target domain clients as well as the corresponding
Acc(OS) = T (23)
K + 1 k=1 D accuracy are depicted in Fig. 6. Fig. 6(a) and (b) presents the
k
t results of Office-31 in the IID and non-IID cases, respectively.
x ∈ D T ∧ ŷ K +1 = y K +1
Acc(UNK) = K +1
(24) All but one W → D source client had a smaller A-distance.
x t ∈ D T Correspondingly, FOSDA could obtain a higher OS and OS*
K +1
on W → D than on W → A. R → P was easier for Office–
where ŷ is the predicted result and DkT is the set of target
Home, with a smaller Adistance. FOSDA obtained a higher
samples with the label yk .
OS and OS* on R → P than on P → C. It can be observed
from Fig. 6 that the transfer task was easier, with a smaller A
E. Results distance, and the accuracy was generally higher for this task.
The results of Digits are summarized in Table III. We con- In contrast, a task with a larger A-distance is more challenging.
ducted comparative experiments on two scenarios with three 2) Ability to Solve Open Set Problems: To demonstrate the
tasks: S → M, U → M, and M → U . In the IID ability of FOSDA to recognize unknown classes, the recog-
case, FOSDA outperformed other FL methods in the different nition accuracy of unknown classes is discussed separately.
evaluation metrics. FOSDA achieved 76.5% on OS and 87.5% As illustrated in Fig. 7(a)–(c), for all of the IID cases, the
on OS*, which indicated an improvement of at least 7.0% on accuracy of FOSDA on the unknown classes was higher than
OS and 6.2% on OS* compared with the other methods. The that of the other methods. In particular, in the Office–Home
tasks in the non-IID case were more difficult than those in A → P IID task, FOSDA achieved 60.2% on UNK, which
the IID case. FOSDA could maintain its effectiveness, with was an improvement of at least 45.7% compared with the
a performance that was 3.5% and 3.2% higher on OS and other methods. Moreover, FOSDA outperformed the other
OS*, respectively, than SHOT, which achieved suboptimal methods by at least 4.4% on OS for all datasets. In com-
performance. The source domain for digits was divided into parison with the other models, FOSDA trained an adversarial
50 clients and the time complexity of KD3A was excessively network to identify unknown classes for OSDA tasks in the
high, as discussed later. client model updating stage. Therefore, FOSDA improved the
The results of Office-31 are listed in Table IV. A total of recognition accuracy of the unknown classes while ensuring
eight methods were used for the comparative experiments. the average accuracy, which was the same for the non-IID
KD3A was included and good performance was achieved. cases. Fig. 7(d)–(f) shows that in the non-IID cases, FOSDA
Similar to FOSDA, it considers the data information of still exhibited better performance compared with the other
the target domain client during the training process, but methods. In particular, UNK of FOSDA on Office–Home was
this is achieved through knowledge voting from the source at least 17.8% higher than that of the other methods.
domain clients to the target domain client. The OS and 3) Comparison of Different Weighted Strategies: The feder-
OS* of the Office-31 IID tasks performed by KD3A were ated self-weighted and federated class-based weighted strate-
90.5% and 92.8%, respectively. However, FOSDA surpassed gies are model aggregation strategies that are based on the
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
7558 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 6, JUNE 2024
TABLE IV
C LASSIFICATION A CCURACY (%) OF OSDA TASKS ON O FFICE -31 (R ES N ET-50)
TABLE V
OS (%) OF OSDA TASKS ON O FFICE –H OME (R ES N ET-50)
TABLE VI
C LASSIFICATION A CCURACY (%) OF D IFFERENT W EIGHTED S TRATEGIES ON O FFICE -31 TASK FOR NON -IID
characteristics of the target domain client data. To highlight the model aggregation and ignored the task information in the
the advantages of these two strategies, we made adjustments to target domain. However, FOSDA indirectly used the statistical
FedEuc and FedCos, which are weighted federated methods, characteristics of the target domain client data as a standard to
and renamed them FedEuc* and FedCos*, respectively. The measure the clients in the model aggregation, which enabled
client models in FedEuc and FedCos were replaced with those the final model to adapt to the task of the target domain
in FOSDA, and FedEuc* and FedCos* had different weighted client more effectively. FedEuc* and FedCos* essentially
strategies compared with FOSDA. Considering the tasks on optimized the model toward the goal of an average model
Office-31 for non-IID as examples, as indicated in Table VI, as far as possible, ignoring the contribution of each client
FOSDA achieved the best results, followed by FedCos*, based model in every aggregation round. This could be guaranteed
on the cosine similarity. FedEuc* and FedCos* directly deter- using (9) in a federated self-weighted strategy that focuses on
mined the distance between the client models as a measure of the contribution of each client. Furthermore, in the non-IID
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
QIN et al.: UNCERTAINTY-AWARE AGGREGATION FOR FEDERATED OPEN SET DOMAIN ADAPTATION 7559
Fig. 6. A-distance and corresponding accuracy on Office-31 W → A and W → D tasks in both IID and non-IID cases, and Office–Home P → C and
R → P tasks in both IID and non-IID cases. (a) Office-31 IID tasks. (b) Office-31 non-IID tasks. (c) Office–Home IID tasks. (d) Office–Home non-IID
tasks.
Fig. 7. Average accuracy among all classes (OS) and accuracy of the unknown classes (UNK) on the Digits M → U tasks in both the IID and non-IID
cases, on the Office-31 W → A tasks in both the IID and non-IID cases, and on the Office–Home A → P tasks in both the IID and non-IID cases.
(a) Digits M → U IID task. (b) Office-31 W → A IID task. (c) Office–Home A → P IID task. (d) Digits M → U non-IID task, (e) Office-31 W → A
non-IID task. (f) Office–Home A → P non-IID task.
scenario, the data classes on the source domain clients were two items, w̃s and w̄s . To express this more clearly, the
extremely imbalanced and the classification capabilities of federated self-weighted strategy is referred to as the FCWS.
each client for every class differed significantly. The federated Fully considering the above-mentioned terms, the ablation
class-based weighted strategy in FOSDA caused each source experimental settings were as follows: 1) w̃s ; 2) w̄s ; 3) FCWS;
client to pay more attention to its ability to classify private 4) w̃s + FCWS; 5) w̄s + FCWS; and 6) FOSDA. The final
class samples. FOSDA outperformed FedEuc* and FedCos* experiment considered the combination of w̃s , w̄s , and FCWS,
by 5.2% and 6.9% on OS and 5.0% and 6.4% on OS*, none of which was ignored. Furthermore, α in (14) was 0.7 for
respectively. Thus, it is clear that after adjusting the local this task, whereas α was a parameter to balance w̃s and w̄s .
model, the performance of the federated model was improved The results of the ablation experiments are listed in
significantly. Table VII. Among w̃s , w̄s , and FCWS, the OS and OS*
4) Ablation Study: We conducted a series of ablation exper- of the first were higher than those of the latter two, which
iments on the W → A task in the non-IID case to verify the demonstrates the importance of measuring the contribution of
importance of each term in the federated self-weighted and each client in this task. Comparing w̄s and w̄s + FCWS, it is
federated class-based weighted strategies. As in the previous clear that FCWS had a negative effect. However, comparing w̃s
introduction, the federated self-weighted strategy includes and w̃s + FCWS, the advantage of FCWS is evident. The final
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
7560 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 6, JUNE 2024
Fig. 8. Comparison of the results of different methods. For Digits, the U → M task was selected, for Office-31, the W → A task was selected, and for
Office–Home, the C → R task was selected. (a) Digits U → M IID task. (b) Office-31 W → A IID task. (c) Office–Home C → R IID task. (d) Digits
U → M non-IID task. (e) Office-31 W → A non-IID task. (f) Office–Home C → R IID task.
TABLE VII which resulted in large fluctuations in the accuracy during the
A BLATION S TUDY ON D IGITS W → A TASK FOR NON -IID training process. In addition, KD3A [35] was also unstable.
KD3A was not designed for the open set problem, and
it directly discards the ambiguous classification results and
focuses on the categories with higher confidence. Under the
experimental scenario of OSDA, new categories appear in the
target domain. This increase in ambiguous classification results
led to instability in KD3A. This situation also existed in the
non-IID case.
TABLE VIII
C LASSIFICATION A CCURACY (%) OF OSDA TASKS ON D IGITS
6) Parameter Impact Analysis: The proposed model incor-
porates the federated self-weighted strategy, and the parameter
α is used for balancing w̃s and w̄s in (14). In the experiments,
α was selected to range from 0.1 to 0.9. It can be observed
from Fig. 9 that values of parameter α of 0.1, 0.3, and
0.3 could achieve the best performance for the U → M,
W → A, and C → R tasks in the IID scenarios, respectively.
The data distribution of each client was essentially the same;
therefore, when training the model, it should be in one
FOSDA achieved the best performance using the combination direction, which means that w̄s is more important than w̃s .
of w̃s , w̄s , and FCWS, with 77.2% on OS and 78.8% on OS*. Moreover, for the U → M, W → A, and C → R tasks in
In our method, all three items were selected. The importance the non-IID case, the corresponding best performances were
of each item varies for different tasks. We emphasize that achieved with α values of 0.9, 0.7, and 0.9, respectively. Thus,
different tasks must have their own uniqueness. in the non-IID scenarios, more attention should be paid to the
5) Model Convergence Analysis: To demonstrate the con- contributions of clients during the aggregation process, as w̃s
vergence and advantages of FOSDA simultaneously, six tasks plays an important role.
on three datasets were selected for experiments in the IID and 7) Additional Discussion on KD3A: KD3A needs to cal-
non-IID cases, as illustrated in Fig. 8. In the IID case, FOSDA culate the total consensus quality for each coalition of source
outperformed the other methods on all three tasks and could domain clients. The complexity of this process is equivalent to
converge efficiently with the global iterations. Fig. 8(d)–(f) the sum of the N source domain clients following the combina-
shows that the FOSDA method achieved the highest accuracy tion: O(N N/2 ). Therefore, a considerable amount of time was
in the respective tasks on all three datasets, and the model required to federate 50 clients on Digits. We made changes to
could converge effectively. KD3A to compare it with FOSDA. In this comparison process,
In contrast, the FedAvg algorithm was less stable because we set only ten source domain clients to participate in each
it did not consider the contributions of individual clients in aggregation. The results are presented in Table VIII. In the
the aggregation process or the uncertainty of the aggregation, IID scenarios, FOSDA was 4.0% higher than KD3A on OS
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
QIN et al.: UNCERTAINTY-AWARE AGGREGATION FOR FEDERATED OPEN SET DOMAIN ADAPTATION 7561
Fig. 9. Comparison of results with different α. For Digits, the U → M task was selected, for Office-31, the W → A task was selected, and for Office–Home,
the C → R task was selected. (a) Digits U → M task. (b) Office-31 W → A task. (c) Office–Home C → R task.
and 3.5% higher on OS*. In the non-IID scenarios, FOSDA [5] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by
was 5.1% higher than KD3A on OS and 5.6% higher on backpropagation,” in Proc. 32nd Int. Conf. Mach. Learn., 2015,
pp. 1180–1189.
OS*. FOSDA achieved better performance than KD3A for all [6] M. Long, Z. Cao, J. Wang, and M. I. Jordan, “Conditional adversarial
non-IID tasks. domain adaptation,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2018,
pp. 1647–1657.
V. C ONCLUSION [7] J. Liu and Y. Ren, “A general transfer framework based on industrial
process fault diagnosis under small samples,” IEEE Trans. Ind. Infor-
In this study, we proposed a novel FL approach to accom- mat., vol. 17, no. 9, pp. 6073–6083, Sep. 2021.
plish OSDA tasks without the need for data sharing among [8] Y. Ren, J. Liu, Q. Wang, and H. Zhang, “HSELL-Net: A heterogeneous
sample enhancement network with lifelong learning under industrial
local clients in order to protect privacy. A federated self- small samples,” IEEE Trans. Cybern., early access, Mar. 22, 2022, doi:
weighted strategy based on diversity and consistency was 10.1109/TCYB.2022.3158697.
proposed to adaptively aggregate the models from different [9] Y. Ren, J. Liu, H. Zhang, and J. Wang, “TBDA-Net: A task-based bias
domain adaptation network under industrial small samples,” IEEE Trans.
sources based on their importance to the task. Moreover, Ind. Informat., vol. 18, no. 9, pp. 6109–6119, Sep. 2022.
a federated class-based weighted strategy was implemented [10] Y. Ren, J. Liu, Y. Chen, and W. Wang, “LJDA-Net: A low-rank joint
to selectively focus on the neurons corresponding to those domain adaptation network for industrial sample enhancement,” IEEE
Sensors J., vol. 22, no. 12, pp. 11881–11891, Jun. 2022.
categories that are available on a local client. These strategies
[11] P. P. Busto and J. Gall, “Open set domain adaptation,” in Proc. IEEE
can flexibly select reliable sources, which is an effective Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 754–763.
strategy, especially for non-IID tasks. The experimental results [12] K. Saito, S. Yamamoto, Y. Ushiku, and T. Harada, “Open set domain
demonstrated that the proposed model could achieve superior adaptation by backpropagation,” in Proc. Eur. Conf. Comput. Vis., 2018,
pp. 153–168.
performance compared with state-of-the-art FL methods. The [13] H. Liu, Z. Cao, M. Long, J. Wang, and Q. Yang, “Separate to
proposed uncertainty-aware aggregation method is suitable for adapt: Open set domain adaptation via progressive separation,” in Proc.
classification or recognition tasks where data with different IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,
pp. 2927–2936.
distributions are scattered among multiple clients. A large [14] L. Zhong, Z. Fang, F. Liu, B. Yuan, G. Zhang, and J. Lu, “Bridging the
number of real-world applications, such as automatic driving, theoretical bound and deep algorithms for open set domain adaptation,”
medical image processing, and financial data analysis, fall into 2020, arXiv:2006.13022.
this category and are expected to benefit from the FL approach. [15] T. Shermin, G. Lu, S. W. Teng, M. Murshed, and F. Sohel, “Adversarial
network with multiple classifiers for open set domain adaptation,” IEEE
The proposed FOSDA framework also has several limita- Trans. Multimedia, vol. 23, pp. 2732–2744, 2021.
tions. First, it only focuses on the task in the target domain. [16] Y. Xu, L. Chen, L. Duan, I. W. Tsang, and J. Luo, “Open
A potential future task would be to consider the perspective set domain adaptation with soft unknown-class rejection,” IEEE
Trans. Neural Netw. Learn. Syst., early access, Aug. 30, 2021, doi:
of the personalized federation so that the trained clients can 10.1109/TNNLS.2021.3105614.
retain as much personalized information as possible to improve [17] R. Li, Q. Jiao, W. Cao, H.-S. Wong, and S. Wu, “Model adapta-
their abilities while achieving the domain adaptation task. tion: Unsupervised domain adaptation without source data,” in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
Second, in the experimental setup of this study, the data of pp. 9641–9650.
multiple source–domain clients were sampled from the same [18] Y. Kim, D. Cho, K. Han, P. Panda, and S. Hong, “Domain adap-
domain, which might not hold in real-world applications. tation without source data,” IEEE Trans. Artif. Intell., vol. 2, no. 6,
pp. 508–518, Dec. 2021.
Further evaluation of the proposed framework in a more
[19] S. Yang, Y. Wang, J. van de Weijer, L. Herranz, and S. Jui, “Casting
realistic setting will be needed in the future. a BAIT for offline and online source-free domain adaptation,” 2020,
arXiv:2010.12427.
R EFERENCES [20] Z. Qiu et al., “Source-free domain adaptation via avatar prototype
[1] J. Lu, V. Behbood, P. Hao, H. Zuo, S. Xue, and G. Zhang, “Transfer generation and adaptation,” 2021, arXiv:2106.15326.
learning using computational intelligence: A survey,” Knowl.-Based [21] J. Liang, D. Hu, and J. Feng, “Do we really need to access the source
Syst., vol. 80, pp. 14–23, May 2015. data? Source hypothesis transfer for unsupervised domain adaptation,”
[2] S. Ben-David et al., “Analysis of representations for domain adaptation,” in Proc. 37th Int. Conf. Mach. Learn., 2020, pp. 6028–6039.
in Proc. Adv. Neural Inf. Process. Syst., vol. 19, 2007, pp. 137–144. [22] J. Nath Kundu, N. Venkat, A. Revanur, M. V. Rahul, and R. Venkatesh
[3] M. Long, Y. Cao, J. Wang, and M. Jordan, “Learning transferable Babu, “Towards inheritable models for open-set domain adaptation,”
features with deep adaptation networks,” in Proc. 32nd Int. Conf. Mach. in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
Learn., 2015, pp. 97–105. Jun. 2020, pp. 12376–12385.
[4] M. Long, H. Zhu, J. Wang, and M. I. Jordan, “Unsupervised domain [23] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas,
adaptation with residual transfer networks,” in Proc. Int. Conf. Neural “Communication-efficient learning of deep networks from decentralized
Inf. Process. Syst., 2016, pp. 136–144. data,” in Proc. Int. Conf. Artif. Intell. Statist., 2017, pp. 1273–1282.
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.
7562 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 35, NO. 6, JUNE 2024
[24] L. Wang, W. Wang, and B. Li, “CMFL: Mitigating communication Zixuan Qin is currently pursuing the master’s
overhead for federated learning,” in Proc. IEEE 39th Int. Conf. Distrib. degree with the College of Intelligence and Com-
Comput. Syst. (ICDCS), Jul. 2019, pp. 954–964. puting, Tianjin University, Tianjin, China.
[25] K. Burlachenko, S. Horváth, and P. Richtárik, “FL_PyTorch: Optimiza- His research interests include federated learning.
tion research simulator for federated learning,” 2022, arXiv:2202.03099.
[26] A. Eslami Abyane, D. Zhu, R. Medeiros de Souza, L. Ma, and
H. Hemmati, “Towards understanding quality challenges of the fed-
erated learning: A first look from the lens of robustness,” 2022,
arXiv:2201.01409.
[27] K. Bonawitz et al., “Practical secure aggregation for privacy-preserving
machine learning,” in Proc. ACM SIGSAC Conf. Comput. Commun.
Secur., 2017, pp. 1175–1191.
[28] P. Mohassel and Y. Zhang, “SecureML: A system for scalable privacy-
preserving machine learning,” in Proc. IEEE Symp. Secur. Privacy (SP),
May 2017, pp. 19–38. Liu Yang (Member, IEEE) received the Ph.D.
[29] C. Zhang, Y. Xie, H. Bai, B. Yu, W. Li, and Y. Gao, “A sur- degree in computer science from the School of Com-
vey on federated learning,” Knowl.-Based Syst., vol. 216, Mar. 2021, puter and Information Technology, Beijing Jiaotong
Art. no. 106775. University, Beijing, China, in 2016.
[30] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, “Federated She is currently an Associate Professor with the
learning with non-IID data,” 2018, arXiv:1806.00582. College of Intelligence and Computing, Tianjin Uni-
[31] P. Xiao, S. Cheng, V. Stankovic, and D. Vukobratovic, “Averaging is versity, Tianjin, China. Her research interests include
probably not the optimum way of aggregating parameters in federated data mining and machine learning.
learning,” Entropy, vol. 22, no. 3, p. 314, 2020.
[32] G. Wang, C. X. Dang, and Z. Zhou, “Measure contribution of partic-
ipants in federated learning,” in Proc. IEEE Int. Conf. Big Data (Big
Data), Dec. 2019, pp. 2597–2604.
[33] Q. Li, B. He, and D. Song, “Model-contrastive federated learning,”
in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
Jun. 2021, pp. 10708–10717. Fei Gao received the M.S. degree in computer
[34] X. Peng, Z. Huang, Y. Zhu, and K. Saenko, “Federated adversarial science from the College of Intelligence and Com-
domain adaptation,” 2019, arXiv:1911.02054. puting, Tianjin University, Tianjin, China, in 2022.
[35] H.-Z. Feng et al., “KD3A: Unsupervised multi-source decentralized Her research interests include transfer learning and
domain adaptation via knowledge distillation,” 2020, arXiv:2011.09757. federated learning.
[36] Y. Mansour, M. Mohri, J. Ro, and A. Theertha Suresh, “Three
approaches for personalization with applications to federated learning,”
2020, arXiv:2002.10619.
[37] S. Boughorbel, F. Jarray, N. Venugopal, S. Moosa, H. Elhadi, and
M. Makhlouf, “Federated uncertainty-aware learning for distributed
hospital EHR data,” 2019, arXiv:1910.12191.
[38] X. Liang, Y. Liu, T. Chen, M. Liu, and Q. Yang, “Federated transfer rein-
forcement learning for autonomous driving,” 2019, arXiv:1910.06001.
[39] F. Zheng, K. Li, J. Tian, and X. Xiang, “A vertical federated learning
method for interpretable scorecard and its application in credit scoring,” Qinghua Hu (Senior Member, IEEE) received the
2020, arXiv:2009.06218. B.S., M.S., and Ph.D. degrees from Harbin Institute
[40] A. Taik, H. Moudoud, and S. Cherkaoui, “Data-quality based scheduling of Technology, Harbin, China, in 1999, 2002, and
for federated edge learning,” in Proc. IEEE 46th Conf. Local Comput. 2008, respectively.
Netw. (LCN), Oct. 2021, pp. 17–23. He was a Post-Doctoral Fellow with the Depart-
[41] R. Giancarlo, D. Scaturro, and F. Utro, “Computational cluster validation ment of Computing, The Hong Kong Polytech-
for microarray data analysis: Experimental assessment of clest, consen- nic University, Hong Kong, from 2009 to 2011.
sus clustering, figure of merit, gap statistics and model explorer,” BMC He is currently the Dean of the School of Artificial
Bioinf., vol. 9, no. 1, pp. 1–19, Dec. 2008. Intelligence, Tianjin, China, the Vice Chairman of
[42] J. Kiefer and J. Wolfowitz, “Stochastic estimation of the maximum of the Tianjin Branch of China Computer Federation,
a regression function,” Ann. Math. Statist., vol. 23, no. 3, pp. 462–466, Tianjin, and the Vice Director of the Special Interest
Sep. 1952. Group (SIG) Granular Computing and Knowledge Discovery, Tianjin. He is
[43] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, currently supported by the Key Program, National Natural Science Foundation
“Reading digits in natural images with unsupervised feature learning,” in of China. He has authored or coauthored over 200 peer-reviewed papers. His
Proc. NIPS Workshop Deep Learn. Unsupervised Feature Learn., 2011, current research interests include uncertainty modeling in big data, machine
pp. 1–9. learning with multimodality data, and intelligent unmanned systems.
[44] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based Dr. Hu is an Associate Editor of the IEEE T RANSACTIONS ON F UZZY
learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, S YSTEMS , Acta Automatica Sinica, and Energies.
pp. 2278–2324, Nov. 1998.
[45] J. J. Hull, “A database for handwritten text recognition research,”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 16, no. 5, pp. 550–554,
May 1994. Chenyang Shen received the B.Sc. degree from
[46] K. Saenko, B. Kulis, M. Fritz, and T. Darrell, “Adapting visual category Yangzhou University, Yangzhou, China, in 2010, and
models to new domains,” in Proc. Eur. Conf. Comput. Vis., 2010, the M.Phil. and Ph.D. degrees from Hong Kong
pp. 213–226. Baptist University, Hong Kong, in 2012 and 2015,
[47] Q. Wang and T. Breckon, “Unsupervised domain adaptation via struc- respectively.
tured prediction based selective pseudo-labeling,” in Proc. AAAI Conf. He is currently with the Division of Medical
Artif. Intell., 2020, vol. 34, no. 4, pp. 6243–6250. Physics and Engineering, Department of Radiation
[48] H. Venkateswara, J. Eusebio, S. Chakraborty, and S. Panchanathan, Oncology, University of Texas Southwestern Med-
“Deep hashing network for unsupervised domain adaptation,” in ical Center, Dallas, TX, USA. His current research
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, interests include medical imaging, scientific comput-
pp. 5018–5027. ing, data mining, and deep learning.
Authorized licensed use limited to: Universita degli Studi di Roma Tor Vergata. Downloaded on December 07,2024 at 18:04:01 UTC from IEEE Xplore. Restrictions apply.