0% found this document useful (0 votes)
7 views12 pages

Genomic Predictions Rubus Brazil Frambuesa

The study evaluates genomic prediction strategies in interspecific biparental populations of the Rubus genus, focusing on late leaf rust resistance in raspberry hybrids. It examines the impact of different reference genomes and training population optimization on prediction accuracy, finding that combined stratified sampling approaches enhance predictive accuracy. Results indicate that genomic selection can significantly improve genetic gains in these interspecific crosses.

Uploaded by

YM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views12 pages

Genomic Predictions Rubus Brazil Frambuesa

The study evaluates genomic prediction strategies in interspecific biparental populations of the Rubus genus, focusing on late leaf rust resistance in raspberry hybrids. It examines the impact of different reference genomes and training population optimization on prediction accuracy, finding that combined stratified sampling approaches enhance predictive accuracy. Results indicate that genomic selection can significantly improve genetic gains in these interspecific crosses.

Uploaded by

YM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Euphytica (2024) 220:146

https://doi.org/10.1007/s10681-024-03406-2

RESEARCH

Comparing strategies for genomic predictions


in interspecific biparental populations: a case study
with the Rubus genus
Allison Vieira da Silva · Melina Prado · Gabriela Romêro Campos · Karina Lima Reis Borges ·
Rafael Massahiro Yassue · Gustavo Husein · Marcel Bellato Sposito · Lilian Amorim ·
José Crossa · Roberto Fritsche‑Neto

Received: 14 April 2024 / Accepted: 27 August 2024 / Published online: 4 September 2024
© The Author(s), under exclusive licence to Springer Nature B.V. 2024

Abstract Genomic selection (GS) is becom- crosses comprehensively. Late leaf rust is a disease
ing increasingly widespread and applied due to the caused by the pathogen Acculeastrum americanum,
promising results obtained, cost savings in generat- and there are reports on genetic resistance in Rubus
ing single nucleotide polymorphism (SNP) markers, occidentalis, which leads to the need for interspecific
and the development of statistical models that allow hybridizations, aiming to combine the fruit quality
to improve the analysis robustness and accuracy. The of R. idaeus with the resistance of R. occidentalis.
composition and size of the training population have The present study was carried out with a population
a major influence on GS, which poses challenges for of 94 interspecific raspberry hybrids. We evaluated
interspecific biparental populations. Another fac- the effect of different reference genomes on the SNP
tor is the use of different reference genomes from markers discovery, as well as training population
other species to perform SNP calling, which could optimization strategies on the accuracy of genomic
make it possible to explore variability in interspecific predictions, namely the CV-α, leaving-one-family-out
(LOFO), pairwise families, and stratified k-fold. The
average predictive accuracies ranged from − 0.33 to
A. V. da Silva (*) · M. Prado · G. R. Campos · G. Husein
Department of Genetics, Luiz de Queiroz” College 0.44 and We demonstrated higher prediction accuracy
of Agriculture, University of São Paulo (USP), Piracicaba, and more precise estimates when we combined strati-
São Paulo, Brazil fied sampling to compose the training set (CV-α and
e-mail: [email protected]
k-fold stratified CV) and the panel of Unique mark-
K. L. R. Borges · J. Crossa · R. Fritsche‑Neto ers. These results corroborate that genomic predic-
Rice Research Station, Louisiana State University tion aligned with SNP calling and training popula-
AgCenter, Rayne, USA tion optimization strategies can significantly increase
genetic gains in interspecific biparental crosses.
R. M. Yassue
GDM Seeds, Campinas, São Paulo, Brazil
Keywords Interspecific · Hybrids · Training set ·
M. B. Sposito Genomic selection · Rubus
Crop Science Department, Luiz de Queiroz College
of Agriculture, University of São Paulo, Piracicaba,
SP 13418‑900, Brazil Abbreviations
ANOVA Analysis of variance
L. Amorim CV Cross-validation
Department of Plant Pathology and Nematology, Luiz GBLUP Genomic best linear unbiased predictor
de Queiroz College of Agriculture (ESALQ), University
of São Paulo (USP), Piracicaba, São Paulo, Brazil GBS Genotyping by sequencing

Vol.: (0123456789)
146 Page 2 of 12 Euphytica (2024) 220:146

GS Genomic selection et al. 2017). In addition, interspecific hybridization


LD Linkage disequilibrium is a naturally occurring phenomenon that generates
LOFO Leave-one-family-out great genetic variability, new haplotypes, and major
PF Pairwise family changes in population allele frequencies and is
SNP Single nucleotide polymorphism one of the main mechanisms responsible for plant
SS Sum of squares adaptation to abrupt environmental changes and the
TS Training population emergence of new species (Runemark et al. 2019).
These factors must be considered when building
the training population to mitigate bias in predictive
Introduction models. Another possibility arising from using
these interspecific populations is the availability of
Every year, genomic selection (GS) (Bernardo 1994; alternative reference genomes from the different
Meuwissen et al. 2001) becomes more widespread species to perform SNP calling, which may allow the
and applied due to the promising results obtained, variability and compatibility of crosses to be explored
cost reduction in the generation of SNP markers, and (Lara et al. 2019).
the development of statistical models that allow the In this context, the best-known raspberry species
inclusion of more and more data (Crossa et al. 2017; are the black raspberry (Rubus occidentalis) and the
Lebedev et al. 2020; Montesinos-López et al. 2023). red raspberry (Rubus idaeus), with the red ones being
In addition, GS makes it possible to shorten the economically more important and more consumed
selection cycle, which has a major impact, especially than the black ones (Baby et al. 2018). In addition,
for perennial species, allowing early selection of the most widely cultivated varieties worldwide have
plants that are still small and before fruiting (Kainer a narrow genetic base. Wild relatives occur in diverse
et al. 2015; Iwata et al. 2016; Lebedev et al. 2020). environments, and this variability represents a genetic
The application of GS in perennial species saves resource available for study and the development of
physical space and maintenance costs in trials since new cultivars (Hall et al. 2009). Specifically in this
superior genotypes are early selected and all efforts crop, late leaf rust is a disease caused by the pathogen
are focused on the selected individuals (Kainer et al. Acculeastrum americanum, which is responsible
2015; Fritsche-Neto et al. 2012). for causing premature defoliation, increasing
For genomic prediction, the models are developed susceptibility to winter damage. The pathogen
from a set of genotyped and phenotyped individuals, also infects the fruit, making it unsuitable for the
which form the training population (TS) and applied fresh fruit market. The disease begins with small
to the test population, containing individuals that orange spots on the abaxial part of the leaf, turning
connect the two populations by kinship, making it brown over time. Young leaves are the last to show
possible to obtain an estimate of the breeding values disease symptoms; middle-aged leaves are the most
of the individuals in the breeding population without susceptible (Ellis et al. 1991; Hall et al. 2009). The
the need to know the phenotype of these individuals, literature on the genetic basis of disease resistance in
using only genotypic data and the genetic relationship the “Jewe”’ variety is scarce, as well as adaptations
between individuals (Desta and Ortiz 2014; Kwong to tropical climatic conditions. The rare reports on
et al. 2017; Xu et al. 2019). Selection based solely genetic resistance to this pathogen are in Rubus
on genotypic data can be carried out in the early occidentalis species (Hall et al. 2009), which leads to
developmental stages, speeding up the breeding an urgent need for interspecific hybridization aiming
program by shortening the crop cycle (Xu et al. 2019; to combine the fruit quality of the R. idaeus species
Montesinos-López et al. 2023). with the resistance of the R. occidentalis species.
The implementation of GS in biparental Prado et al. (2024) in a recent study have found
interspecific populations becomes complex (Olatoye evidence that resistance to raspberry late leaf rust is
et al. 2020). In this scenario, the composition and size polygenic, with regions of major effects and minor
of the training population have the greatest influence effects that play significant roles in rust resistance in
on predictive ability, as genetic structure within raspberry.
families and between species are common (Tan

Vol:. (1234567890)
Euphytica (2024) 220:146 Page 3 of 12 146

Given the above, the objective was to evaluate Inoculation


the effect of the reference genome for the discovery
of SNP markers and the training population com- The inoculum of the fungus Acculeastrum
position strategies on the accuracy of genomic pre- americanum was prepared by the Phytopathology
dictions models for late leaf rust in biparental and Department of ESALQ/USP. Suspensions were
interspecific populations of the Rubus genus. obtained using 50 mL of distilled water, Tween
20 (0.01%), and A. americanum urediniospores.
The suspension concentration was adjusted to ­ 104
urediniospores/mL in a Neubauer chamber and used
Material and methods to inoculate by spraying the abaxial side of the leaves
up to the point of oozing. In order to guarantee the
Biological material development of the disease, the pots were covered for
24 h with a plastic bag to create a humid chamber.
Ninety-four raspberry hybrids were obtained from
crosses in partnership with the Plant Production
department at ESALQ/USP. The crosses were made Phenotyping and conducting the experiment
using a testcross scheme, in which the first group
consisted of three parents of the Rubus idaeus The experimental unit was defined as a single plant in
species of the “Golden Bliss”, “Salmon”, and a 5-L pot. The plants were grown in a greenhouse in
“Himbo Top” varieties, with favorable morpho- Piracicaba-SP, Brazil (22°42′S, 47°38′W, 540 m). The
agronomic characteristics and different levels of experimental units were arranged in an augmented
susceptibility to late leaf rust. The second group block design repeated across time, with three blocks,
comprised Rubus occidentalis parents of the "Jewel" where the parents (“Golden Bliss,” “Salmon,”
variety, a source of late leaf rust resistance alleles. “Himbo Top,” and “Jewel”) were used as checks
Different numbers of hybrids were obtained from and were present in each one of the blocks. The
each cross, and in all crosses, the “Jewel” variety experiment was repeated twice, once in November
was used as the female parent (Table 1). 2021 and again in April 2022. Phenotyping was
More details about the materials used can be carried out based on the classification of the plants in
found in the work published by Campos et al. terms of the severity of the disease on the leaves of
(2023). In this work, the authors characterized this the plants. The severity of the disease was classified
panel of hybrids’ diversity and genetic structure, on the seventeenth day after inoculation based on the
which initially had 116 genotypes. Still, because diagrammatic scale proposed by Dias et al. (2022)
they are temperate climate materials, some of the with eight levels, from 0 to 8, with 0 being the
genotypes were lost in the process of adapting to absence of the disease and 8 being the most severe
the climatic conditions found in Piracicaba, São stage. Three measurements were taken for each plant
Paulo, Brazil (22°42′S, 47°38′W, 540 m), where the by three different evaluators, and an arithmetic mean
experiment was conducted. In addition to climatic was generated from the three observations to establish
adversities, genetic incompatibility may have led to the final classification.
the loss of some materials.

Genomic characterization
Table 1  Number of hybrids (N) obtained from crosses
between the Jewel variety, as the female relative in all crosses, The DNA from the samples was extracted from the
and the Golden Bliss (JG), Salmon (JS), and Himbo Top (JT) leaf tissue according to the protocol suggested by the
varieties manufacturer of the extraction kit (Qiagen), using the
Parent Jewel Golden Bliss Salmon Himbo Top DNeasy Plant Mini Kit. After extraction, the genomic
library was built based on the protocol proposed
Family – JG JS JT
by Poland et al. (2012) with some adaptations. The
N – 35 28 31
enzymes PstI (rare cut) and MseI (frequent cut) from

Vol.: (0123456789)
146 Page 4 of 12 Euphytica (2024) 220:146

New England BioLabs Inc.® were used to digest the 𝐲 = 𝐗𝟏 𝐫 + 𝐗𝟐 𝐠 + 𝐙 𝐛 + 𝐞 (1)


DNA. Libraries were sequenced on a HiSeq 2500
System sequencer (Illumina, Inc). where y is the vector of observed severity values in
The samples were sequenced on the Illumina a given stage; r is the fixed effect of repetition; g is
platform in collaboration with the Genetic Diversity the fixed effect of genotype; b is the random effect of
and Improvement Laboratory of the Genetics block nested with repetition, where N(0, 𝐈𝜎b2); e is the
Department at ESALQ/USP. SNP calling was carried random effect of the residuals, where N(0, 𝐈𝜎e2); ­X1
out using TASSEL-GBS (Glaubitz et al. 2014). and ­X2 are the fixed effect incidence matrices; Z1 is
The sequences were aligned using two reference the random effect incidence matrix; I is an identity
genomes available for raspberries: the genome of matrix. The adjusted means for each genotype and
the red raspberry, Rubus idaeus (scaffolds), and the stage from this model were used in the subsequent
genome of the black raspberry, Rubus occidentalis analyses.
(chromosomes). A total of 275,904,265 sequences
were identified and submitted to quality control. Only
SNPs with a minimum allele frequency (MAF) of 20% Prediction models
and a call rate of over 90% were selected. This MAF
value was used to control possible errors from the Two prediction models were tested, the additive and
genotyping platform and because we were interested the additive-dominant (VanRaden 2008):
in the resistance alleles of the maternal progenitor. 𝐲 ∗= 𝐗𝛃 + 𝐙𝛂 + 𝐙𝛅 + 𝐞 (2)
Missing data was imputed using the kNNI method from
the impute package (Hastie et al. 2022). In the filtering where y* corresponds to the adjusted measurements
for missing data, no individuals with a percentage of of the individuals in the first stage, with dimension
missing data greater than 30% were identified. n × 1 where n corresponds to the number of
After quality control, three marker matrices were observations; X corresponds to the matrix of
generated. The Moc matrix, with 20,382 SNPs, incidence of the fixed effects, β corresponds to
was obtained by aligning the sequences to the the vector of the fixed effect of the intercept, Z
reference genome of the R. occidentalis species. corresponds to the matrix of incidence of the random
The Mid matrix, with 20,133 SNPs, was generated (genetic) effects with dimension n x n, where n is
by aligning the sequences to the genome of the R. equal to the number of genotypes, α and δ correspond
idaeus species. The third matrix with 30,398 SNPs, to the vectors related to Z of the effects of dominance
the Unique matrix, was obtained by joining the Moc and additivity, respectively, where α ~ N(0, Ga 𝜎a2),
and Mid matrices but eliminating redundant markers δ ~ N (0, Gd 𝜎d2 ), Ga and Gd are genetic relationship
between the two matrices (Campos et al. 2023). matrices of additivity and dominance, respectively,
The markers were also filtered for the LD (Linkage obtained by the methodology described by VanRaden
Disequilibrium) parameter; the LDs between the (2008) using the snpReady package (Granato et al.
markers were calculated between 100 Kbp intervals 2018). The residual is represented by e, where e ~ N(0,
using the correlation method, and the 99% threshold I 𝜎e2). The prediction models were implemented in the
was used to filter the markers. The three marker R software with the help of the BGLR package (Perez
matrices used were the same as those used in a and Los Campos 2014) using 15,000 burn-ins and
previous work published by Campos et al. (2023) 30,000 iterations.
in which, using principal component analysis, we Genomic heritability in the restricted sense was
evaluated how the hybrid population is grouped calculated based on the variance components obtained
according to each of the matrices. with the G matrix from fitting the additive GBLUP
(Genomic Best Linear Unbiased Predictor) model in
Phenotypic analysis the BGLR package. We used the ratio between the
genetic variance based on the markers and the sum
The phenotypic data was analyzed using the of the genetic variance based on the markers and the
statgenSTA package (Rossum et al. 2023) in the R residual variance. Heritability was estimated for each
environment using the following model: of the marker panels.
Vol:. (1234567890)
Euphytica (2024) 220:146 Page 5 of 12 146

Table 2  Factors considered when evaluating the accuracy of Cross‑validation of prediction models
the prediction models tested
Factors Four cross-validation schemes were used (Table 2).
CV Panel Model
In the first (Fig. 1a) we used the CV-α method
(Yassue et al. 2021) with five folds and four repeti-
CV-α Mid A tions. In the second method (Fig. 1b), the K-fold
LOFO Moc AD was stratified with five folds, in which each family
PF Unique contributed 20% of its individuals to the composi-
K-fold tion of the training population. In the third, LOFO
Four cross-validation (CV) strategies were evaluated: (leave-one-family-out), one family was used as the
the CV-α method (Yassue et al. 2021), leave-one-family- validation population, and the other two families
out (LOFO), pairwise families (PF), and stratified k-fold constituted the training population, with a total of
(STRAT). The three marker matrices (Panel) are represented three repetitions (Fig. 1c). In the fourth scheme
by Mid (generated from the reference genome of the R.
idaeus species), Moc (generated from the reference genome (Fig. 1d), (PF) we separated the families into pairs
of the R. occidentalis species), and Unique (generated from and used one of the families as the TS and the other
combining the unique markers present in the other two as the validation population in all six possible com-
matrices). A and AD, respectively, represent the additive and binations between the three families (JG/JS, JS/JG,
additive-dominant models
JG/JT/, JT/JG, JS/JT and JT/JS). In each scenario,
the accuracy of the prediction was estimated using

Fig. 1  Cross-validation schemes evaluated in the study. The keeping 20% and 80% of the families in the testing and training
number (1–5) indicates the folds and JG (Jewel x Golden set, respectively, c LOFO (leave-one-family-out), d PF (pair-
Bliss), JS (Jewel x Salmon), JT (Jewel x Himbo Top) the cross- wise family)
ings. a CV-α, b K-fold with stratification balanced by family

Vol.: (0123456789)
146 Page 6 of 12 Euphytica (2024) 220:146

Pearson’ correlation between the adjusted mean val- Discussion


ues and the values estimated by the GBLUP model.
An analysis of variance was carried out to assess Ideally, a prediction model should enable GS on
the influence of each factor on the mean prediction individuals from several different populations while
accuracies. The effect size corresponds to the eta- maintaining satisfactory accuracy values for the
squared calculated for each of the effects from the target trait. One of the assumptions of the GBLUP
percentage corresponding to the sum of the squares model, and the majority of regression methods, is
(SS) of each effect concerning the total sum of the that the effect of allelic substitution is homogeneous
squares according to the following equation: for all traits among all individuals (de los Campos
& Sorensen 2014). However, prediction accuracy
SS effect
Eta squared = (3) is influenced by several factors, such as the size
SS total
and genetic diversity of the training population,
the heritability of the trait of interest, the density
of the marker panel, the effects of markers and
genes on the trait, the extent and distribution of LD
Results between markers and QTLs (Kaler et al. 2022), the
presence of population structure, differences in the
The hybrids showed great phenotypic variability in linkage phases of haplotypes and large variations
terms of disease severity. The values ranged from in allele frequencies in the different subpopulations
the absence of the disease to the maximum level of or families in the training set composition of the
severity among the genotypes, and variation could population to be predicted.
be observed within all the families (Fig. 2). The cal- In a preliminary study, Campos et al. (2023)
culated heritabilities for the Mid, Moc, and Unique observed that the three marker panels performed
matrices were 0.37, 0.38, and 0.33, respectively. similarly in assessing genetic diversity and structure
The average predictive accuracies ranged from parameters. In our study, we observed different per-
− 0.33 to 0.44. The results obtained using the additive formances regarding prediction accuracy between
and additive-dominant models (described by tables A the marker panels, where the Unique matrix per-
and AD in Fig. 3) show a similar pattern in terms of formed best. One aspect that should be taken into
mean and data dispersion when combined with the account is the different sizes of the marker panels:
panel of brands and the cross-validation method. the Unique matrix has 30,398 SNPs, the Moc matrix
The different marker panels showed variation has 20,382 SNPs, and the Mid matrix has 20,133
in terms of mean and distribution depending on SNPs. A greater density of markers can improve
the cross-validation method used. In the LOFO genomic predictions but also results in higher geno-
and PF cross-validation methods, the three marker typing costs, so it is important to balance the gains
matrices also showed a wide dispersion in prediction in prediction accuracy with the costs associated
accuracy across folds, although PF generated a with obtaining a denser panel (DoVale et al. 2022).
higher prediction accuracy mean (Fig. 3). Although In addition, it is commonly agreed that the gains
in the ANOVA, only the SNP panel appeared to be in prediction accuracy from increasing the number
significant in the average accuracy values obtained of markers eventually reach a plateau (Krishnappa
(Table 3), we could see that the Unique matrix et al. 2021). Using different reference genomes
provides better accuracy when combined with CV from different species to perform SNP calling has
methods which do not consider family structure. made it possible to explore variability in interspe-
In the CV-⍺ method, we observed the smallest cific crosses comprehensively (Lara et al. 2019). In
dispersion of results around the mean. Using the this context, the use of a marker panel based on the
CV-⍺ and K-fold methods, the Unique matrix results two genomes may have made it possible to capture
indicated a gain in prediction accuracy compared to rare variants of lesser effect and different haplotypes
the other two marker matrices. at different linkage stages. Thereby, this marker
panel probably better captured the effect of allelic

Vol:. (1234567890)
Euphytica (2024) 220:146 Page 7 of 12 146

Fig. 2  Barplot in ascend-


ing order of the distribu-
tion of values for disease
severity on the Y axis and
each of the hybrids and the
parents of each family on
the X axis. a distribution of
severity values for the JG
family, where the vertical
line in black represents
the value presented by the
“Jewel” parent (0.26) and
the line in red represents the
Golden Bliss parent (4.31).
b distribution of severity
values for the JS family,
where the vertical black
line represents the value
presented by the "Jewel"
parent (0.26) and the red
line represents the Salmon
parent (5.38). c distribution
of severity values for the JT
family, where the verti-
cal black line represents
the value presented by the
“Jewel” parent (0.26) and
the red line represents the
Himbo Top parent (6.51)

Vol.: (0123456789)
146 Page 8 of 12 Euphytica (2024) 220:146

Fig. 3  Accuracy of severity prediction as a function of the left, represented by A, and the values from the Additive and
CV-A (CV-⍺), K-fold (with five folds and 20% representation Dominant model are on the right, represented by AD. The vari-
of each of the three families) LOFO (leave-one-family-out), ation in the coloring of the box plots represents the three pan-
and PF (pairwise families) cross-validation schemes, respec- els of SNPs
tively. The values obtained from the additive model are on the

Table 3  Analysis of variance carried out to assess the influ- a model across populations due to different aspects.
ence of each factor on the mean prediction accuracies Lehermeier et al. (2015) observed differences in
Source Df Sum Sq F Pr(> F) Eta-squared marker effect estimates between different clusters
in the same population due to differences in the LD
CV 3 0.1198 2.000 0.1188 0.0511
pattern of markers and QTL. Legarra et al. (2021)
Panel 2 0.1676 4.194 0.0178* 0.0722
observed that allelic substitution effects could
Model 1 0.0155 0.775 0.3808 0.0066
vary between populations and across generations
Residuals 101 2.0176
due to changes in genetic relationships, magnitude
CV corresponds to the cross-validation scheme, Panel of additive and non-additive variances and allele
corresponds to marker panel used (Mid, Moc, or Unique), and frequencies. In this scenario, the composition and
model corresponds to the prediction model (additive GBLUP
or additive and dominant GBLUP). The size effect corresponds
size of the training population have a major impact
to the eta-squared calculated for each effect (Cohen 1988) on the predictive capacity of the model since the
*Statistically significant with p < 0.05 model tends to perform better if trained on a group
of individuals that best represent all these aspects
of population diversity (Isidro et al. 2015; Tan et al.
substitution and consequently improved the accu- 2017; Berro et al. 2019; Montesinos-López et al.
racy (Rooney et al. 2022; MacLeod et al. 2016). 2024). These factors must be considered when
Genomic selection in populations with family building the training population in order to mitigate
structure becomes complex and challenging, as with bias in predictive models.
the interspecific population used (Tan et al. 2017; Raspberry is an allogamous species, and it is
Olatoye et al. 2020). The presence of genetic structure important to highlight that in the specific case of
can reduce the stability and predictive accuracy of this population under study, the F1 segregating

Vol:. (1234567890)
Euphytica (2024) 220:146 Page 9 of 12 146

generation was generated from a set of interspecific population can be a tool used to minimize the effect
crosses. Nevertheless, there is a lack of background of genetic structure by sampling in proportion to the
knowledge on the genetic architecture of resistance size of each cluster, thus potentially capturing genetic
against leaf late rust disease in raspberries. In this diversity in the training population and improving
scenario, we sought to expand the GBLUP model to the model’s predictive capacity (Isidro et al. 2015;
evaluate the inclusion of the dominance effect in the Hoque et al. 2024). The K-fold method considers
GBLUP model. The use of this model, considering balanced sampling across families, with 20% of the
the effects of additivity and dominance, has had a individuals from each family being sampled to form
positive impact on gaining prediction accuracy and each of the five folds. The sampling of individuals
selecting elite clonally propagated materials (Resende within each family was done randomly. In contrast,
et al. 2017; Nadeau et al. 2023) such as raspberry. CV-α sampling remains random and does not account
However, we observed that the inclusion of the for the existence of three different families or their
dominance effect did not generate gains in prediction sizes. Our population is small, and we have different
accuracy compared to the additive model. It may numbers of individuals within each family. Although
be due to the absence of a significant dominance the numbers are relatively close, the JG family has
effect or other influencing factors. The total size of 35 individuals, while the JS family has 28, meaning
the population and the number of individuals per the JG family has 25% more individuals than the JS
family are factors that influence the estimation of family. These differences in family size and sampling
the dominance effect and, consequently, can impact methodology may explain the variations in prediction
the gain in prediction accuracy from the inclusion accuracy between the methods with random
of dominance (Tan et al. 2018). The limited number sampling.
of individuals in our populations makes it difficult Overall, our results highlight the importance of
to accurately estimate dominance effects. So, we did carefully designing the training set for raspberry
not observe any differences between the accuracies breeding when using genomic selection to make
obtained with and without the inclusion of the effect. predictions across populations. When dealing
Regarding the cross-validation scheme and the with structured populations or family structures,
composition of the training population, we observed it is crucial to balance and stratify the breeding
higher prediction accuracies when we combined the population into training and testing sets to minimize
CV-α and the K-fold method with stratified sampling potential bias in effect estimation, especially in
for the composition of the TS together with using the small populations. This approach helps reduce the
Unique brand matrix. Although the panel was the risk of encountering negative accuracies observed
only statistically significant factor according to the in pairwise and leave-one-family-out (LOFO) cross-
ANOVA analysis, we observed that the Unique matrix validations. Strategies such as test-and-shelf may
generated similar accuracies to the other matrices present a viable alternative for implementing genomic
when we considered the family structure (LOFO and selection in raspberry breeding (Boyles et al. 2024).
PF) in the cross-validation scheme. Due to the larger On the other hand, these strategies require much
number of markers, we expected the Unique panel to larger and more diverse training sets, and this aspect
improve prediction accuracy across all CV schemes. should be carefully analyzed to balance the costs
However, our results suggest that the increase in the of genotyping and phenotyping a larger number of
number of markers was not sufficient to compensate individuals against the gains in prediction accuracy
for the lower relatedness between the training and (Wu et al. 2023).
testing sets in the LOFO and PF schemes. The CV-α Genetic structure can arise from different
method provided estimates of prediction accuracy levels of genetic relatedness between individuals,
with less dispersion than the other CV methods including separating individuals into families
evaluated, as the main purpose of the method is to (Würschum et al. 2017). Studies show that genomic
allow genotypes to be allocated to folds in such a prediction within families can generate significant
way as to maximize the independence of accuracy gains in prediction accuracy in the presence of
estimation errors (Yassue et al. 2021). The K-fold different patterns of LD of the markers, allele
method with a stratified composition of the training frequencies, and different substitution effects

Vol.: (0123456789)
146 Page 10 of 12 Euphytica (2024) 220:146

(Würschum et al. 2017; Berro et al. 2019). Thus, Author contributions AVS, JC and RFN elaborated on the
making predictions within raspberry families hypothesis, conducted the analyses, helped to interpret the
results, and contributed to the writing. LA and MBS funding
would be a useful tool to elucidate the impact of and elaborated on the hypothesis. MP, GRC, RMY, GH, and
genetic structure at the family level. However, one KLRB contributed to the panel evaluation and characterization,
of the biggest challenges in making predictions helped to interpret the results, writing, and discussion. All
in raspberry is the difficulty in obtaining and authors read and approved the final manuscript.
maintaining hybrids since the limited number of Funding The authors have not disclosed any funding.
individuals commonly sampled makes it impossible
to build training and validation populations with a Declarations
satisfactory number of individuals (Montesinos-
Conflict of interest The authors declare that they have no
López et al. 2024). In regions with a hot and humid conflict of interest.
climate, raspberries have limitations regarding
vegetative development, flower production, and
fruit set. Long-term exposure to stress caused by References
high temperatures can inhibit photosynthesis and
cause premature plant death (Fernandez et al. 2018). Baby B, Antony P, Vijayan R (2018) Antioxidant and antican-
This study was carried out in the municipality of cer properties of berries. Crit Rev Food Sci Nutr 58:2491–
Piracicaba, State of São Paulo, Brazil (22°42′30″ 2507. https://​doi.​org/​10.​1080/​10408​398.​2017.​13291​98
Bernardo R (1994) Prediction of maize single-cross perfor-
S, 47°38′00" W, 546 m). The location has a tropical mance using RFLPs and information from related hybrids.
climate with a dry winter season, classified as Aw in Crop Sci 34:20–25
the Köppen climate classification (Dias et al. 2017). Berro I, Lado B, Nalin RS, Quincke M, Gutiérrez L (2019)
The experiment was planned to accommodate more Training population optimization for genomic selection.
Plant Genome 12:190028. https://​doi.​org/​10.​3835/​plant​
individuals, with more than 160 hybrids obtained genom​e2019.​04.​0028
from the crosses. Still, unfortunately, many of the Boyles RE, Ballén-Taborda C, Brown-Guedira G, Costa J,
individuals did not adapt to the tropical climatic Cowger C, DeWitt N, Griffey CA, Harrison SA et al
conditions and were lost during the planning phase (2024) Approaching 25 years of progress towards Fusar-
ium head blight resistance in southern soft red winter
of the work. In addition to the difficulty in obtaining wheat (Triticum aestivum L.). Plant Breed 143:66–81.
viable individuals from interspecific crosses due to https://​doi.​org/​10.​1111/​pbr.​13137
gametic incompatibility (Pinczinger et al. 2021). Campos GR, Prado M, Borges KLR, Yassue RM, Sabadin
With this work, we have provided the community F, Silva AV, Barbosa CMA, Sposito MB, Amorim L,
Fritsche-Neto R (2023) Construction and genetic char-
with an initial attempt to implement GS in a acterization of an interspecific raspberry hybrids panel
population of interspecific raspberry hybrids. aiming resistance to late leaf rust and adaptation to tropi-
Among the factors that added complexity to the cal regions. Sci Rep 13:15216. https://​doi.​org/​10.​1038/​
development of the study, we can highlight the s41598-​023-​41728-8
Cohen J (1988) Statistical power analysis for the behavioral
limited size of the populations analyzed and the sciences. Academic Press, New York
scarce literature on the genetic basis of disease Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O,
resistance in the “Jewel” variety, as well as Jarquín D, De Los CG, Burgueño J, González-Camacho
adaptations to tropical climatic conditions. We JM, Pérez-Elizalde S, Beyene Y, Dreisigacker S, Singh R,
Zhang X, Gowda M, Roorkiwal M, Rutkoski J, Varshney
demonstrated higher prediction accuracy and more RK (2017) Genomic selection in plant breeding: methods,
precise estimates when we combined stratified models, and perspectives. Trends Plant Sci 22:961–975.
sampling to compose the training set (CV-α and https://​doi.​org/​10.​1016/j.​tplan​ts.​2017.​08.​011
k-fold stratified CV) and the panel of Unique de los Campos G, Sorensen D (2014) On the genomic analysis
of data from structured populations. J Anim Breed Genet
markers. We have provided important information 131:163–164. https://​doi.​org/​10.​1111/​jbg.​12091
on the complexity of efficiently sampling the Desta ZA, Ortiz R (2014) Genomic selection: Genome-
genetic diversity of the genomes of the two species wide prediction in plant improvement. Trends Plant Sci
and a first direction in developing a strategy for 19:592–601. https://​doi.​org/​10.​1016/j.​tplan​ts.​2014.​05.​006
Dias HB, Alvares CA, Sentelhas PC (2017) A century of mete-
constructing TS. Additionally, we demonstrated the orological data in Piracicaba, SP: Climate changes accord-
effect of population structure under different CV ing to the Köppen classification. In: Brazilian Congress
schemes in interspecific raspberry hybrids.

Vol:. (1234567890)
Euphytica (2024) 220:146 Page 11 of 12 146

of Agrometeorology, Symposium on Climate Change and Kwong QB, Ong AL, Teh CK, Chew FT, Tammi M, Mayes S,
Desertification of the Brazilian Semiarid. Kulaveerasingam H, Yeoh SH, Harikrishna JA, Appleton
Dias MG, Ribeiro RR, Barbosa A, Jesus CM, Spósito MB DR (2017) Genomic selection in commercial perennial
(2022) Diagrammatic scale for improved late leaf rust crops: applicability and improvement in oil palm (Elaeis
severity assessments in raspberry leaves. Can J Plant Path guineensis Jacq.). Sci Rep 7:2872. https://​doi.​org/​10.​1038/​
45(2):140–147. https://​doi.​org/​10.​1080/​07060​661.​2022.​ s41598-​017-​02602-6
21475​87 Lara LAC, Santos MF, Jank L, Chiari L, Vilela MM, Amadeu
DoVale JC, Carvalho HF, Sabadin F et al (2022) Genotyping RR, Dos Santos JPR, Pereira GDS, Zeng ZB, Garcia AAF
marker density and prediction models effects in long- (2019) Genomic selection with allele dosage in panicum
term breeding schemes of cross-pollinated crops. Theor maximum jacq. G3 Bethesda 9:2463–2475. https://​doi.​
Appl Genet 135:4523–4539. https://​doi.​org/​10.​1007/​ org/​10.​1534/​g3.​118.​200986
s00122-​022-​04236-3 Lebedev VG, Lebedeva TN, Chernodubov AI, Shestibratov KA
Ellis MA, Converse RH, Williams RN, Williamson B (1991) (2020) Genomic selection for forest tree improvement:
Compendium of raspberry and blackberry diseases and methods. Achiev Perspect Forests 11:1190. https://​doi.​
insects, 2nd edn. APS Press, St. Paul org/​10.​3390/​f1111​1190
Fernandez GE, Molina-Bravo R, Takeda F (2018) What we Legarra A, Garcia-Baccino CA, Wientjes YCJ, Vitezica ZG
know about heat stress in rubus. In: Raspberry: breeding, (2021) The correlation of substitution effects across pop-
challenges and advances, pp 29–40 ulations and generations in the presence of nonadditive
Fritsche-Neto R, Resende MDV, Miranda GV, DoVale JC functional gene action. Genetics 219:iyab138. https://​doi.​
(2012) Seleção genômica ampla e novos métodos de mel- org/​10.​1093/​genet​ics/​iyab1​38
horamento do milho. Revista Ceres 59:794–802. https://​ Lehermeier C, Schön CC, de Los CG (2015) Assessment of
doi.​org/​10.​1590/​S0034-​737X2​01200​06000​09 genetic heterogeneity in structured plant populations using
Glaubitz JC, Casstevens TM, Lu F, Harriman J, Elshire RJ multivariate whole-genome regression models. Genet-
et al (2014) TASSEL-GBS: a high capacity genotyping ics 201:323–337. https://​doi.​org/​10.​1534/​genet​ics.​115.​
by sequencing analysis pipeline. PLoS ONE 9(2):e90346. 177394
https://​doi.​org/​10.​1371/​journ​al.​pone.​00903​46 MacLeod IM, Bowman PJ, Vander Jagt CJ, Haile-Mariam M,
Granato ISC, Galli G, de Oliveira Couto EG, Souza MB, Men- Kemper KE, Chamberlain AJ, Schrooten C, Hayes BJ,
donça LF, Fritsche-Neto R (2018) snpReady: a tool to Goddard ME (2016) Exploiting biological priors and
assist breeders in genomic analysis. Mol Breeding 38:102. sequence variants enhances QTL discovery and genomic
https://​doi.​org/​10.​1007/​s11032-​018-​0844-8 prediction of complex traits. BMC Genom 17:144. https://​
Hall HK, Hummer KE, Jamieson AR, Jennings SN, Weber CA doi.​org/​10.​1186/​s12864-​016-​2443-6
(2009) Plant breeding reviews. Wiley-Blackwell, New Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of
Jersey total genetic value using genomewide dense marker maps.
Hastie T, Tibshirani R, Narasimhan B, Chu G (2022) Impute: Genetics 157:1819–1829
Imputation for microarray data. R package version 1.70.0 Montesinos-López OA, Bentley AR, Saint Pierre C, Crespo-
Hoque A, Anderson JV, Rahman M (2024) Genomic prediction Herrera L, Rebollar-Ruellas L, Valladares-Celis PE, Lil-
for agronomic traits in a diverse Flax (Linum usitatissi- lemo M, Montesinos-López A, Crossa J (2023) Efficacy of
mum L.) germplasm collection. Sci Rep 14:3196. https://​ plant breeding using genomic information. Plant Genome
doi.​org/​10.​1038/​s41598-​024-​53462-w 16(2):e20346. https://​doi.​org/​10.​1002/​tpg2.​20346
Isidro J, Jannink JL, Akdemir D, Poland J, Heslot N, Sor- Montesinos-López OA, Crespo-Herrera L, Xavier A, Godwa
rells ME (2015) Training set optimization under popu- M, Beyene Y, Saint Pierre C, de la Rosa-Santamaria R,
lation structure in genomic selection. Theor Appl Genet Salinas-Ruiz J, Gerard G, Vitale P, Dreisigacker S, Lil-
128:145–158. https://​doi.​org/​10.​1007/​s00122-​014-​2418-4 lemo M, Grignola F, Sarinelli M, Pozzo E, Quiroga M,
Iwata H, Minamikawa MF, Kajiya-Kanegae H, Ishimori M, Montesinos-López A, Crossa J (2024) A marker weight-
Hayashi T (2016) Genomics-assisted breeding in fruit ing approach for enhancing within-family accuracy in
trees. Breed Sci 66:100–115. https://​doi.​org/​10.​1270/​ genomic prediction. G3 Genes Genom Genet 14(2):278.
jsbbs.​66.​100 https://​doi.​org/​10.​1093/​g3jou​rnal/​jkad2​78
Kainer D, Lanfear R, Foley WJ, Külheim C (2015) Genomic Nadeau S, Beaulieu J, Gezan SA, Perron M, Bousquet J, Lenz
approaches to selection in outcrossing perennials: focus PRN (2023) Increasing genomic prediction accuracy for
on essential oil crops. Theor Appl Genet 128:2351–2365. unphenotyped full-sib families by modeling additive and
https://​doi.​org/​10.​1007/​s00122-​015-​2591-0 dominance effects with large datasets in white spruce.
Kaler AS, Purcell LC, Beissinger T, Gillman JD (2022) Front Plant Sci 14:1137834. https://​doi.​org/​10.​3389/​fpls.​
Genomic prediction models for traits differing in heritabil- 2023.​11378​34
ity for soybean, rice, and maize. BMC Plant Biol 22:87. Olatoye MO, Clark LV, Labonte NR, Dong H, Dwiyanti MS,
https://​doi.​org/​10.​1186/​s12870-​022-​03479-y Anzoua KG, Brummer JE, Ghimire BK, Dzyubenko
Krishnappa G, Savadi S, Tyagi BS, Singh SK, Mamrutha HM, E, Dzyubenko N, Bagmet L, Sabitov A, Chebukin P,
Kumar S, Mishra CN, Khan H, Gangadhara K, Uday Głowacka K, Heo K, Jin X, Nagano H, Peng J, Yu CY,
G et al (2021) Integrated genomic selection for rapid Yoo JH, Zhao H, Long SP, Yamada T, Sacks EJ, Lipka
improvement of crops. Genomics 113:1070–1086. https://​ AE (2020) Training population optimization for genomic
doi.​org/​10.​1016/j.​ygeno.​2021.​02.​007 selection in miscanthus. G3 Genes Genom Genet
10(7):2465–2476. https://​doi.​org/​10.​1534/​g3.​120.​401402

Vol.: (0123456789)
146 Page 12 of 12 Euphytica (2024) 220:146

Pérez P, los Campos G (2014) Genome-wide regression and Tan B, Grattapaglia D, Wu HX, Ingvarsson PK (2018)
prediction with the BGLR statistical package. Genetics Genomic relationships reveal significant dominance
198(2):483–495. https://​doi.​org/​10.​1534/​genet​ics effects for growth in hybrid Eucalyptus. Plant Sci 267:84–
Pinczinger D, von Reth M, Hanke MV, Flachowsky H (2021) 93. https://​doi.​org/​10.​1016/j.​plant​sci.​2017.​11.​011
Self-incompatibility of raspberry cultivars assessed by VanRaden PM (2008) Efficient methods to compute genomic
SSR markers. Sci. Hortic 288:110384 predictions. J Dairy Sci 91:4414–4423. https://​doi.​org/​10.​
Poland JA, Brown PJ, Sorrells ME, Jannink JL (2012) Devel- 3168/​jds.​2007-​0980
opment of high-density genetic maps for barley and wheat Wu PY, Ou JH, Liao CT (2023) Sample size determina-
using a novel two-enzyme genotyping-by-sequencing tion for training set optimization in genomic predic-
approach. PLoS ONE 7(2):e32253. https://​doi.​org/​10.​ tion. Theor Appl Genet 136:57. https://​doi.​org/​10.​1007/​
1371/​journ​al.​pone.​00322​53 s00122-​023-​04254-9
Prado M, Silva AV, Campos GR, Borges KLR, Yassue RM, Würschum T, Maurer HP, Weissmann S, Hahn V, Leiser WL
Husein G, Akens FF, Sposito MB, Amorim L, Behrouzi (2017) Accuracy of within- and among-family genomic
P, Bustos-Korts D, Fritsche-Neto R (2024) Complemen- prediction in triticale. Plant Breed 136:230–236. https://​
tary approaches to dissect late leaf rust resistance in an doi.​org/​10.​1111/​pbr.​12465
interspecific raspberry population. Genes Genom Genet. Xu Y, Liu X, Fu J, Wang H, Wang J, Huang C, Prasanna BM,
https://​doi.​org/​10.​1093/​g3jou​rnal/​jkae2​02 Olsen MS, Wang G, Zhang A (2019) Enhancing genetic
Resende R, Resende M, Silva F, Azevedo C, Dapiaggi M, gain through genomic selection: from livestock to plants.
Soares L, Costa E, Martins R, Faria D, Neves L, Oliveira Plant Commun 1(1):100005. https://​doi.​org/​10.​1016/j.​
M, Lima B, Alves R, Lima F, Matrangolo W, Silva-Jr xplc.​2019.​100005
O, Grattapaglia D et al (2017) Assessing the expected Yassue RM, Sabadin F, Galli G, et al. (2021) CV-α: designing
response to genomic selection of individuals and families validation sets to increase the precision and enable mul-
in Eucalyptus breeding with an additive-dominant model. tiple comparison tests in genomic prediction. Euphytica
Heredity 119:245–255. https://​doi.​org/​10.​1038/​hdy.​2017.​ 217:106. https://​doi.​org/​10.​1007/​s10681-​021-​02831-x
37
Rooney TE, Kunze KH, Sorrells ME (2022) Genome-wide Publisher’s Note Springer Nature remains neutral with regard
marker effect heterogeneity is associated with a large to jurisdictional claims in published maps and institutional
effect dormancy locus in winter malting barley. Plant affiliations.
Genom 15(4):e20247. https://​doi.​org/​10.​1002/​tpg2.​20247
Rossum BJ, Eeuwijk FA, Boer M, Malosetti M, Bustos-Korts
Springer Nature or its licensor (e.g. a society or other partner)
D, Millet E, Paulo J (2023) statgenSTA: single trial analy-
holds exclusive rights to this article under a publishing
sis (STA) of field trials R Package Version 1 11
agreement with the author(s) or other rightsholder(s); author
Runemark A, Vallejo-Marin M, Meier JI (2019) Eukaryote
self-archiving of the accepted manuscript version of this article
hybrid genomes. PLoS Genet 15(11):e1008404. https://​
is solely governed by the terms of such publishing agreement
doi.​org/​10.​1371/​journ​al.​pgen.​10084​04
and applicable law.
Tan B, Grattapaglia D, Martins GS et al (2017) Evaluat-
ing the accuracy of genomic prediction of growth and
wood traits in two Eucalyptus species and their F1
hybrids. BMC Plant Biol 17:110. https://​doi.​org/​10.​1186/​
s12870-​017-​1059-6

Vol:. (1234567890)

You might also like