0% found this document useful (0 votes)

31 views13 pages

Thalén Et Al. - 2023 - Patchwork Alignment-Based Retrieval and Concatena

Uploaded by

edliver

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views13 pages

Thalén Et Al. - 2023 - Patchwork Alignment-Based Retrieval and Concatena

Uploaded by

edliver

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

GBE

Patchwork: Alignment-Based Retrieval

and Concatenation of Phylogenetic Markers from
Genomic Data

Downloaded from https://academic.oup.com/gbe/article/15/12/evad227/7470721 by UFRJ - CFCH/IFCS user on 12 September 2024

Felix Thalén1,2, Clara G. Köhne1, and Christoph Bleidorn 1,
*

1
Department for Animal Evolution and Biodiversity, Georg-August-Universität Göttingen, Göttingen 37073, Germany
2
Cardio-CARE AG, Medizincampus Davos, Davos Wolfgang 7265, Switzerland

*Corresponding author: E-mail: [email protected].

Accepted: December 06, 2023

Abstract
Low-coverage whole-genome sequencing (also known as “genome skimming”) is becoming an increasingly affordable ap
proach to large-scale phylogenetic analyses. While already routinely used to recover organellar genomes, genome skimming
is rather rarely utilized for recovering single-copy nuclear markers. One reason might be that only few tools exist to work with
this data type within a phylogenomic context, especially to deal with fragmented genome assemblies. We here present a new
software tool called Patchwork for mining phylogenetic markers from highly fragmented short-read assemblies as well as
directly from sequence reads. Patchwork is an alignment-based tool that utilizes the sequence aligner DIAMOND and is writ
ten in the programming language Julia. Homologous regions are obtained via a sequence similarity search, followed by a “hit
stitching” phase, in which adjacent or overlapping regions are merged into a single unit. The novel sliding window algorithm
trims away any noncoding regions from the resulting sequence. We demonstrate the utility of Patchwork by recovering near-
universal single-copy orthologs within a benchmarking study, and we additionally assess the performance of Patchwork in
comparison with other programs. We find that Patchwork allows for accurate retrieval of (putatively) single-copy genes
from genome skimming data sets at different sequencing depths with high computational speed, outperforming existing
software targeting similar tasks. Patchwork is released under the GNU General Public License version 3. Installation instruc
tions, additional documentation, and the source code itself are all available via GitHub at https://github.com/fethalen/
Patchwork.
Key words: genome skimming, low-coverage sequencing, museomics, phylogenomics, short reads, single-copy genes.

Significance
Even though current sequencing and computational methods allow for the completion of high-quality genomes for all
life on earth, the availability of material for sequencing became a major bottleneck in phylogenomic studies, especially
since material stored in museum collections—or during barcoding campaigns—is often not suitable for reconstructing
high-quality, highly continuous genomes. At the same time, the output of short-read sequencing machines is increasing,
and prices for these techniques are dropping. Short-read data are still routinely used to recover organellar genomes, but
this so-called genome skimming approach is rather rarely utilized for recovering single-copy nuclear markers. We pre
sent a new software tool called Patchwork for mining phylogenetic markers from highly fragmented genome assem
blies, as well as directly from short sequence reads. We demonstrate the accuracy of this new approach and show in
a benchmarking study that it also outperforms existing software for similar tasks. Patchwork allows to compile prese
lected gene sets from low-coverage short-read sequencing data sets and is thereby ideally suited when including ma
terial from museum collections into phylogenomic studies.

© The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse,
distribution, and reproduction in any medium, provided the original work is properly cited.

Genome Biol. Evol. 15(12) https://doi.org/10.1093/gbe/evad227 Advance Access publication 12 December 2023 1
Thalén et al. GBE

Introduction due to the introduction of newer sequencing platforms

(e.g. Illumina's NovaSeq sequencing platform) short-read
Advancements in high-throughput sequencing techniques
WGS became relatively cheap and prices are even expected
have revolutionized the field of phylogenetics and ultimate
to drop with Ultima Genomics, another highly competitive
ly our understanding of the tree of life (Lemmon and
sequencing platform entering the market (Simmons et al.
Lemmon 2013). The availability of genomic and
2023). Moreover, short-read sequencing library construc
transcriptomic data for basically all desired taxa and for a
tion also allows that highly fragmented DNA can be used

Downloaded from https://academic.oup.com/gbe/article/15/12/evad227/7470721 by UFRJ - CFCH/IFCS user on 12 September 2024

reasonable price has transformed the field to phyloge
as input (Hu et al. 2021), thereby enabling the use of mater
nomics: genome-scale phylogenetic systematic analyses ial from museum collections from all around the world
(McCormack et al. 2013). Some challenges remain, how (Raxworthy and Smith 2021). Consequently, LC-WGS can
ever, as many studies still show incongruent results, low be used to generate data from various sources of targeted
branch support, or lacking resolution (Philippe et al. organisms to retrieve marker loci on a genome scale. While
2017; Steenwyk et al. 2023). Even though complete gen this so-called “genome skimming” approach has frequent
omes are becoming available for more and more eukar ly been used to reconstruct organellar genomes or other
yotes, the access to high-molecular-weight DNA is the high-copy fractions of eukaryote genomes (Richter et al.
bottleneck in the quest for sequencing genomes of all life 2015; Jin et al. 2020), it seems currently underutilized to re
on earth (Blom 2021; Dahn et al. 2022). Nowadays and in trieve single-copy nuclear markers (Liu et al. 2021). One
the past, most large-scale phylogenomic studies were con reason is that short-read assemblies of eukaryotic genomes
ducted using either transcriptome sequencing or genome tend to be highly discontinuous, and automated annota
subsampling methods such as target enrichment, which fo tion of such large, fragmented genomes remains difficult
cuses on a set of preselected loci (Bleidorn 2017). (Salzberg 2019), as they are characterized by the presence
Transcriptome sequencing offers a way to sequence only of “genes in pieces,” where introns interrupt coding se
the expressed portion of a genome without prior sequence quences (Rogozin et al. 2005). Depending on the coverage,
knowledge (Stark et al. 2019). Unfortunately, this approach short-read draft genomes are characterized by low N50s in
requires freshly collected material or specifically stored ma the range of few (if at all) kilobase pairs (kb; Salzberg et al.
terial, e.g. deeply flash frozen or in RNAlater. Furthermore, 2012), and consequently, exons of a single gene usually
smaller specimens may need to be pooled together to at end up on several contigs.
tain sufficient amounts of mRNA, and such practice risks The disuse of genome skimming in large-scale phyloge
mixing up individuals with undetected genetic variation. netics could potentially be ascribed to the lack of suitable
Unfortunately, a large amount of collected specimens data analysis methods (Zhang et al. 2019). Existing soft
only exist in natural history museum collections, and most ware tools for working with LC-WGS data in a phyloge
of these are ethanol preserved and thus not usable for tran nomic context, such as aTRAM 2 (Allen et al. 2017,
scriptomic studies (Call et al. 2021). As taxon sampling is 2018), ALiBaSeq (Knyshov et al. 2021), and GeMoMa
considered one of the most important factors for accurate (Keilwagen et al. 2016, 2018), are either written in an inter
phylogenetic tree reconstruction (Heath et al. 2008), it preted language (e.g. Perl or Python) that does not allow
would be missing an opportunity to leave the potential of the program to scale well with the large biological data
natural history collections untapped. Target enrichment ap sets that are commonplace today (e.g. aTRAM 2,
proaches, on the other hand, require prior knowledge of ALiBaSeq) or need well-annotated reference genomes or
target sequences (e.g. from well-annotated genomes) for transcriptomes (e.g. GeMoMa). A recent addition to the
the construction of oligonucleotide probes. Moreover, the portfolio of available tools for such programs is
number of enriched targets is limited by the number of oli Read2Tree, which directly infers trees from unassembled
gonucleotides included in the enrichment kit of choice, and data (Dylus et al. 2023).
the efficiency of such approaches decreases as the To address the limitations typically associated when
bait-to-target distance increases (Bragg et al. 2016). working with genome skimming data, we present
Another downside is that the data produced are difficult Patchwork, an alignment-based tool for mining phylogen
to reuse for other types of genomic or evolutionary studies. etic markers directly from WGS data. Patchwork utilizes
A viable alternative to assemble taxon-rich phyloge the sequence aligner DIAMOND (Buchfink et al. 2021)
nomic data sets is low-coverage whole-genome sequen and is written in the programming language Julia
cing (LC-WGS; also known as “shallow genome (Bezanson et al. 2017) to achieve the best possible speed,
sequencing” or “genome skimming”) using short-read thus allowing Patchwork to scale well with today's
technologies such as Illumina sequencing (Dodsworth genome-scale data sets. In addition, our implementation
2015). Relying solely on this approach has been shown to focuses on ease of use, and our program handles each
be inadequate for the reconstruction of highly contiguous step in the analysis—from start to finish. Using our new ap
reference-quality genomes (Rhie et al. 2021). However, proach, we targeted universal single-copy orthologs

2 Genome Biol. Evol. 15(12) https://doi.org/10.1093/gbe/evad227 Advance Access publication 12 December 2023
Patchwork: Alignment-Based Retrieval and Concatenation GBE

Downloaded from https://academic.oup.com/gbe/article/15/12/evad227/7470721 by UFRJ - CFCH/IFCS user on 12 September 2024

FIG. 1.—Graphical overview of the Patchwork algorithm. First, a) query sequences are aligned to the provided reference sequence. These alignments may
or may not be overlapping. b) Overlapping alignments are realigned but only in the area in which they overlap. The best-scoring alignment is retained while all
others are discarded. c) Nonaligned residues are then removed, and d) the remaining regions are concatenated into a single, continuous sequence.

(USCOs), which are available based on careful analysis of a --retain-stops and --retain-ambiguous flags, respectively.
curated database (OrthoDB, www.orthodb.org). A set of Finally, Patchwork implements a sliding window–based
954 metazoan-specific USCOs has been validated against alignment trimming step to remove poorly aligned residues
364 metazoan genomes and shown to be indeed (i) single- (e.g. due to the presence of putative noncoding regions)
copy and (ii) nearly universally present (Manni et al. 2021). from the resulting sequences. The output is available as nu
cleotide or amino acid sequences.

Results Benchmark
Patchwork is a reference- and alignment-based method for To asses performance of our approach, we (i) test an ideal
mining phylogenetic markers from WGS data, using either case where the query and reference species are identical,
assembled contigs or reads as input (Fig. 1). The aim of (ii) where the query and reference are 2 distant species,
Patchwork is to capture multiexon or fragmented genes, and (iii) compare Patchwork v.0.5.1 with ALiBaSeq v.1.2
scattered across different contigs or reads. One or more ref (Knyshov et al. 2021) and aTRAM v.2.4.3 (Allen et al.
erence protein sequences guide the “stitching” process, 2017). Throughout these benchmarks, we use Illumina
where the best-scoring translated query nucleotide se short-read nucleotide sequences from the marine annelid
quences for any given region are merged into continuous Dimorphilus gyrociliatus (accession PRJEB37657 in the
stretches of amino acid sequences. Merged sequences go European Nucleotide Archive). A highly contiguous
through a masking step in which unaligned residues, am (N50 = 2.24 Mb) and complete (95.8% BUSCO genes
biguous amino acid characters (letters that do not deter recovered, metazoa_odb10) annotated version of the
mine a unique amino acid; they are B, J, X, or Z, where compact 73.8-Mb genome of this annelid is publicly avail
B = D or N, J = I or L, X = unknown, and Z = E or Q), and able (Martín-Durán et al. 2021).
stop codons are removed from query sequences. As we only used short-read data sets at different cov
Optionally, the removal of stop codons and ambiguous erages for our benchmark analyses, we created highly dis
amino acid characters may be skipped by providing the continuous assemblies with low N50s as typical for

Genome Biol. Evol. 15(12) https://doi.org/10.1093/gbe/evad227 Advance Access publication 12 December 2023 3
Thalén et al. GBE

Downloaded from https://academic.oup.com/gbe/article/15/12/evad227/7470721 by UFRJ - CFCH/IFCS user on 12 September 2024

FIG. 2.—Percent identity and query coverage in markers based on a Patchwork analysis of a SPAdes assembly of D. gyrociliatus, targeting 815 single-copy
orthologs from the same species.

real-world low-coverage genomic data sets. We assembled Table 1

these sequence reads using SPAdes and subsequently used Results from Patchwork when using a D. gyrociliatus SPAdes assembly as
Patchwork to search for near-USCOs (Seppey et al. 2019), the query and USCOs from a long-read assembly of D. gyrociliatus as a
using a preannotated set of USCOs from that same species reference
as a reference. Next, we used that same assembly of Variable Mean Min Median Max
D. gyrociliatus to search for USCOs, this time using Reference length 447.606 77 351.0 2,748
USCOs from the leech Helobdella robusta as the reference, Query length 407.953 27 322.0 2,553
a clitellate annelid that diverged at least 400 mya from Matches 385.075 24 306.0 2,549
Mismatches 21.207 0 0.0 1,075
D. gyrociliatus (Erséus et al. 2020). Finally, we compared
Deletions 42.131 0 5.0 1,097
our program to ALiBaSeq (Knyshov et al. 2021) and
Query coverage 92.181 5.22 98.71 100.0
aTRAM 2 (Allen et al. 2017). For this comparison, we also
Identity 95.887 30.91 100.0 100.0
subsampled the aforementioned D. gyrociliatus short reads
in order to simulate various sequencing coverages. We
decided not to include the software GeMoMa (Keilwagen hypothetical case where the entire set of reference se
et al. 2016) in this comparison, as it heavily relies on the quences should be recoverable as exactly matching stitched
availability of reference genomic or transcriptomic data. contigs from the query sequences. We retrieved all of the
Read2Tree (Dylus et al. 2023) has also not been included initial 815 markers. On average, 95.9% of all aligned posi
in the comparisons, as its focus is tree inference and not tions were identical matches, with a mean query coverage
marker retrieval. of 92.2%; this equals a combined measure of 88.4% iden
We compared the retrieved translated and stitched con tical matches for all reference positions, whether aligned to
tigs, hereafter called “recovered markers,” to the reference query residues or not (Fig. 2 and Table 1).
D. gyrociliatus USCOs. For each reference sequence, the
evaluation included percent identical positions out of all
aligned positions as well as percent of reference sequence Effect of Reference Divergence
positions covered by the recovered markers. Patchwork In the second iteration, we aligned a set of high-coverage
automatically generates these statistics and produces a de query assemblies against a very distant reference set, in or
tailed output for each reference as well as an aggregated der to estimate the program's performance when using
output over all references. highly divergent sequences as reference. For this purpose,
the same D. gyrociliatus SPAdes assembly as in the previous
Effect of Genome Fragmentation on Accuracy evaluation served as query sequence set, and 957
In the initial setup, we assessed the accuracy of Patchwork near-USCOs from the annotated genome of the leech
using a high-quality query assembly and 815 USCO refer H. robusta were used as a reference. We retrieved 943
ence sequences from the same species, D. gyrociliatus, out of the 957 H. robusta reference sequences. Of these,
and thereby exploring the program's performance for the 769 successfully aligned back to 1 and only 1 of the 778

4 Genome Biol. Evol. 15(12) https://doi.org/10.1093/gbe/evad227 Advance Access publication 12 December 2023
Patchwork: Alignment-Based Retrieval and Concatenation GBE

Downloaded from https://academic.oup.com/gbe/article/15/12/evad227/7470721 by UFRJ - CFCH/IFCS user on 12 September 2024

FIG. 3.—Percent identity and query coverage in markers based on a Patchwork analysis of a SPAdes assembly of D. gyrociliatus, targeting 957 single-copy
orthologs from the leech H. robusta.

Table 2 performs best over almost all data sets (Fig. 4), reaching
Results from Patchwork when using a D. gyrociliatus SPAdes assembly as as much as 62% total percent identity for data with at least
the query and USCOs from H. robusta as a reference 10× coverage. Only with a coverage of 1×, D. gyrociliatus
Variable Mean Min Median Max read-based data seem to be better suited for marker re
Reference length 448.025 77 351.0 2.748 trieval. Cutoff thresholds during the assembly might lead
Query length 309.93 31 259.0 2.326 to discarding part of the sequence data that is retained
Matches 249.126 15 195.0 2.326 when using reads, therefore causing the latter to achieve
Mismatches 30.234 0 7.0 355
higher query coverage for the 1× data set. Note that query
deletions 8.319 0 2.0 268
coverage improves especially for read data when tantan
Query coverage 74.540 5.41 82.78 100.0
Identity 89.008 25.46 96.99 100.0
masking in DIAMOND is disabled (i.e. by providing
--masking 0 as an argument). For data sets with higher
The recovered markers were evaluated against the set of 778 D. gyrociliatus
USCOs that were considered homologous to sequences in the H. robusta reference coverage, running Patchwork on read data still achieves
set. well over 50% total percent identity. Using read data there
fore is a valid option that could be considered if the com
D. gyrociliatus USCOs that were considered homologous
to the H. robusta set (Fig. 3 and Table 2). For these 769 re pute resources necessary for assembling the sequences
trieved markers, the average percent identity measure was are scarce. The performance of Patchwork stays approxi
89%, with a mean query coverage of 74.5%. Put different mately constant for data sets with coverages of at least
ly, the recovered markers had an average of 67.2% identi 10×, independent of the used data. By comparison,
cal matches against all reference positions. ALiBaSeq achieves approximately 7% less total percent
identity than Patchwork with assemblies for all data sets
and performs only slightly better than Patchwork with
Program Comparison read data for a coverage over 10×. aTRAM 2, on the other
In the third setup, we compared the performance and hand, performs comparatively poorly, with a maximum to
runtime for Patchwork to that of ALiBaSeq (Knyshov tal percent identity of about 22% for the data set with 20×
et al. 2021) and aTRAM 2 (Allen et al. 2018), using a coverage. This is mostly due to the small number of recov
D. gyrociliatus short-read data set at different sequence ered markers; the markers themselves generally have a high
coverage levels (1×, 3×, 5×, 10×, 20×, and 40×). While percent identity value. For a coverage of 1×, aTRAM 2 was
Patchwork can use both reads and assembled contigs as unable to recover any stitched contigs at all. The program
an input, ALiBaSeq uses assembled contigs, and aTRAM 2 was also not evaluated for the data set of 40× coverage
is read based. Performance was assessed using a combined as it had not completed within the cluster's maximum run
measure for accuracy and completeness of the recovered time of 5 d.
USCO markers annotated, hereafter called “total percent Both Patchwork and ALiBaSeq are very fast; the pro
identity.” Patchwork with D. gyrociliatus assembly data grams terminated in under 5 min when using assembly

Genome Biol. Evol. 15(12) https://doi.org/10.1093/gbe/evad227 Advance Access publication 12 December 2023 5
Thalén et al. GBE

Downloaded from https://academic.oup.com/gbe/article/15/12/evad227/7470721 by UFRJ - CFCH/IFCS user on 12 September 2024

FIG. 4.—Accuracy and completeness of the recovered marker sequences for the different D. gyrociliatus data sets when run against a reference set of H.
robusta USCOs. Accuracy and completeness were jointly measured as percent identical out of all aligned positions multiplied with the total percentage of
aligned nongap positions. This integrated measure avoids a distorted performance estimation, e.g. due to small number of recovered markers but high percent
identity in the aligned positions. Patchwork was run with D. gyrociliatus assemblies unless indicated differently. ALiBaSeq received assemblies, while aTRAM 2
received reads as input.

FIG. 5.—Program runtime for each D. gyrociliatus data set. Patchwork was run both as a script and as a compiled program. It received D. gyrociliatus
assemblies unless indicated differently. ALiBaSeq was run on assemblies, while aTRAM 2 received reads as input.

6 Genome Biol. Evol. 15(12) https://doi.org/10.1093/gbe/evad227 Advance Access publication 12 December 2023
Patchwork: Alignment-Based Retrieval and Concatenation GBE

Brueelia antiqua
100 Bothriometopus macronemis
Haematopinus macronemis

Proechionopthirus fluctus
100 Antarctophtirus microchir

Downloaded from https://academic.oup.com/gbe/article/15/12/evad227/7470721 by UFRJ - CFCH/IFCS user on 12 September 2024

84 Linognathus spicatus

100 Hoplopleura arborcicola

100 Neohaematopinus pacificus
81
Pedicinius badius
Pthirus gorillae
100 100 Pthirus pubis
Pediculus schaeffi B
100
100 Pediculus schaeffi A
100 Pediculus humanus A 2013
100 Pediculus humanus B 2013
Tree scale: 0.1

FIG. 6.—Phylogenetic analysis of Phthiraptera relationships as recovered from a Maximum Likelihood analysis of a combined supermatrix using USCOs as
recovered by Patchwork. Analysis was conducted using IQ-TREE 2 including model and partition finding. Bootstrap values from 1,000 pseudoreplicates are
given at the branches.

data (Fig. 5). The runtimes fluctuated only slightly Discussion

between data sets. Using Patchwork with reads
Patchwork is a new software for quickly mining phylogen
required more time for larger data sets, but even for
etic markers from WGS data. Since Patchwork can retrieve
the largest evaluated data set, it finished after half an
homologous regions even in distantly related taxa, this pro
hour. By comparison, running aTRAM 2 took days for
gram lends itself especially well for recovering phylogenetic
all data sets.
markers for phylogenomic studies. It is simultaneously an
efficient way for increasing marker occupancy in poorly as
Patchwork in a Phylogenomic Context sembled genomes and/or in the presence of multilocus
To demonstrate how our software could be utilized in a exons. Finally, Patchwork allows the user to combine 2 dif
phylogenomic pipeline, we used it to retrieve a set of 957 ferent data types—i.e. transcriptomic and genomic data—
metazoan-specific USCOs from a phthirapteran data set into a single data set, thus further enabling an even larger
(Allen et al. 2017). When reusing a set of 15 lice Operaional taxon sampling and encouraging data reusability.
Taxonomic Units (Hexapoda and Phthiraptera), we were Special consideration should be taken to avoid the cre
able to retrieve all 957 USCOs, for all taxa. The resulting ation of chimeric sequences. One way in which such se
alignments contained few gaps for any marker; i.e. most quences may arise is when orthologous (i.e. genes related
markers were well above the 90% aligned position trim via a speciation event) and paralogous (i.e. genes related
ming threshold. The trimmed alignment contained via a gene duplication event) sequences are merged to
3,454,320 positions in total, compared to 5,383,303 be gether. To circumvent this issue, we recommend that the
fore trimming (i.e. ∼64% positions were retained after user limits the use of reference sequences to near-USCOs.
trimming). Our phylogenetic reconstruction resulted in a Different lineage-specific sets of such USCOs are available
well-supported tree (Fig. 6), which is largely congruent based on carefully analyzed sets of homologous genes
with the original analysis (Allen et al. 2017), with the excep from a curated database (Manni et al. 2021). Besides their
tion of the position of Haematopinus macronemis. How use in evaluating the quality of genomic and metagenomic
ever, this placement is the only part of the tree that is not data, USCOs became also prominent as preselected marker
well supported, and reasons for incongruence are unclear, sets in phylogenomic analyses (Sahbou et al. 2022) and
which could be, e.g. slightly different choice of phylogenet have been recently proposed as a unifying framework for
ic markers. However, in general, the approach worked very DNA-based species delimitation (Dietz et al. 2023). Many
well, and for 951 of 954 USCOs, nearly complete exonic programs, e.g. the aforementioned program BUSCO, exist
data could be retrieved. for retrieving such sequences from an already assembled

Genome Biol. Evol. 15(12) https://doi.org/10.1093/gbe/evad227 Advance Access publication 12 December 2023 7
Thalén et al. GBE

genome, and these could be used as reference sequences github.com/fethalen/Patchwork), is distributed under
(Waterhouse et al. 2018). Additionally, several downstream the GPLv3 license, and targets both Linux and macOS
analysis tools are available to control for the presence of (Windows users may run Patchwork by using the
possible cross-contamination, (unexpected) paralogous Windows Subsystem for Linux).
copies, or other artifacts confounding systematic studies In order to facilitate reproducibility, a Docker container
(Lozano-Fernandez 2022). To control for the possible arti (Merkel 2014) of Patchwork is also distributed via the
factual inclusion of stretches of noncoding sequences, the BioContainers framework (da Veiga Leprevost et al.

Downloaded from https://academic.oup.com/gbe/article/15/12/evad227/7470721 by UFRJ - CFCH/IFCS user on 12 September 2024

tool PREQUAL could be used to detect such and remove 2017). Similarly, we also provide an Apptainer definition
such regions (Whelan et al. 2018). Finally, multiple align file for users of the Apptainer/Singularity platform.
ment tools such as MACSE (Ranwez et al. 2018) can be Apptainer (formerly known as Singularity; Kurtzer et al.
used to deal with putative confounding problems from 2017) is another container platform that targets shared sys
the occurrence of premature stop codons, which might oc tems such as High-Performance Computing platforms,
cur when working with data with coverage genomic data. which are commonplace at universities today.
The accuracy and the robustness of the results depends Most phylogenomic studies include more than a handful
on how closely related the target and the reference species of taxa, and concatenating these manually gets increasingly
under study are. The difficulty stems from the ability to ac tedious as the data set size increases. Therefore, Patchwork
curately predict noncoding regions in aligned contigs; be also includes a set of complementary tools for streamlining
cause alignment trimming relies on gap-excluded identity, the downstream analysis. For example, the script multi_
choosing the correct cutoff threshold becomes increasingly patchwork.sh lets the user run Patchwork on multiple input
easier as the level of identity approaches 100% (the identity files and concatenate homologous sequences from differ
of noncoding regions is likely to stay the same, while the ent taxa into 1 file.
identity to coding regions increases). On the upside, high-
quality genomes for practically all major lineages exist and
are readily available online (Formenti et al. 2022). Initial Alignment and Database Construction
Moreover, and not surprisingly, the coverage of the input First, all reference protein sequences, regardless of whether
read data sets correlates with the performance of retrieving they are spread across multiple FASTA files or not, are
single-copy marker genes. Similar to a previous study (Liu pooled together into a single FASTA file, from which a
et al. 2021), we also find that a coverage of 10× and DIAMOND database is created. There is also the option to
more should be targeted when designing genome skim use an existing DIAMOND-formatted database or a BLAST
ming studies. However, as seen in the proof of principle, output file in a tabular format by using the --database or
even lower coverages enable the construction of phyloge --tabular options, respectively. These files are both provided
nomic data matrices. For very low-coverage data sets, the in the output of Patchwork and can thus be reutilized when
read-based mode outperforms assembly-based analyses. trying out different parameters. In either case, DIAMOND's
For the latter, assembly size seems to be more important BLASTX algorithm is used to align translated nucleotide se
than contiguity. quences to 1 or more reference protein sequences.
In summary, Patchwork allows the retrieval of (putative Like DIAMOND, Patchwork, by default, scores align
ly) single-copy genes from genome skimming data sets at ments using the substitution matrix BLOSUM62 (Henikoff
different sequencing coverage with high computational and Henikoff 1996), a gap open penalty of 11, and a gap
speed. Availability and quality of biological specimens are extension penalty of 1. Other built-in or custom substitu
becoming the major bottleneck for phylogenomic studies. tion matrices may be used in place of the default option.
Especially for phylogenomic studies relying on collection- User-chosen gap open penalties and gap extension penal
based material, Patchwork offers a fast and efficient way ties may also be set, as long as they fall within the limits
for marker retrieval from short-read sequence data sets. set by the substitution matrix of choice. For the users’ con
venience, Patchwork supports a number of different
DIAMOND options that can usually be provided in the
Materials and Methods same manner as in DIAMOND itself.
Patchwork is implemented in Julia (Bezanson et al. 2017), a For all Patchwork benchmarks, we observed that disab
just-in-time (JIT)–compiled programming language that ling DIAMOND's tantan masking (Frith 2011), by setting
is typically faster than interpreted languages such as --masking 0, as described in Table 2, yielded higher query
Python or R. Existing Julia bioinformatics packages such as coverages. This effect was more pronounced for read
BioAlignments.jl (https://github.com/BioJulia/BioAlignments.jl) data sets but could also be detected in assembled data
and BioSequences.jl (https://github.com/BioJulia/Bio sets. On the other hand, the number of exact matches in
Sequences.jl) were used to speed up the development all aligned positions (i.e. percent identity) between the
process. Patchwork is obtainable from GitHub (https:// query and the reference decreased slightly. When

8 Genome Biol. Evol. 15(12) https://doi.org/10.1093/gbe/evad227 Advance Access publication 12 December 2023
Patchwork: Alignment-Based Retrieval and Concatenation GBE

combining both measures, however, disabling tantan (correctly) join 2 or more regions that are located on separ
masking improved the overall results. ate contigs due to incomplete assembly or sequencing
Since the alignment search is likely to result in more than errors.
1 hit per reference region, certain measures are taken to en
sure that none of these hits are overlapping: They are, “hit Alignment Masking
stitching” (also known as contig or exon stitching; i.e. mer At this step, unaligned residues, ambiguous amino acid char
ging of overlapping regions), removal of unaligned resi

Downloaded from https://academic.oup.com/gbe/article/15/12/evad227/7470721 by UFRJ - CFCH/IFCS user on 12 September 2024

acters, and stop codons (also known as “termination co
dues, and concatenation of nonoverlapping regions. dons”) are all removed from the resulting query sequence.
Query sequences may contain residues that do not align to
Hit Stitching any particular region of the subject sequence. Such regions
may be noncoding regions or simply insertions. In either
During “hit stitching,” all alignments made between the case, unaligned residues are removed on the basis that inserts
query region and the target sequence are merged in a are less likely to constitute phylogenetically informative sites
way such that only the highest-scoring segment pair and risks introducing untranslated regions and therefore bias
(HSP) for each region is retained. This results in a single, ing the downstream analysis. Similarly, ambiguous amino
continuous sequence, and, as a consequence, some hits acids are most likely noninformative, and stop codons are a
may be removed entirely (see also Fig. 1). clear indicator that noncoding characters have been included
The “hit stitching” algorithm works as follows: First, in the alignment. Although such regions are likely to be re
query regions are sorted according to how they align to moved in the subsequent step (see above), the user may
the target sequence—from first to last—and are added to choose to keep stop codons and/or ambiguous amino acid
the stack. Next, each pair of query regions on the stack is characters by providing the flags --retain-stops and/or
checked for overlaps. In case of an overlap, first, all regions --retain-ambiguous.
are sorted by their first and last position at which they align
to the reference sequence. The first region is added to the Sliding Window–Based Alignment Trimming
stack. Its start and end coordinates are then compared with
those of the following region to check if they are overlap One side effect of aligning translated nucleotide sequences
ping. If they are not overlapping, the next adjacent region to amino acid sequences is that one might recover non
is added to the stack and compared with the following re coding portions of DNA, provided that the following 2 con
gion. If they are overlapping, however, the region that is ditions are fulfilled: (i) the noncoding DNA is located in
currently at the top of the stack is removed. The overlap between 2 or more coding portions and (ii) there is a se
ping parts of this region and the next region are realigned quence region in the reference sequence that the non
to identify the best-scoring sequence at that particular coding region can align to. In the resulting alignment,
interval. Then, based on the realignment score, the se noncoding portions are characterized by many indels, inter
quences are sliced such that the best-scoring sequence is cepted by occasional matches. The alignment of noncoding
retained at the overlapping region and so that the nonover portions of DNA can already be observed in the alignments
lapping, flanking parts of both regions, if existing, are pre produced by DIAMOND, and thus, this side effect does not
served as well. Thus, a maximum of 3 sliced region parts are stem from Patchwork itself. In fact, the Patchwork algo
then added to the stack as new, separate regions: The se rithm will only include noncoding parts if nothing else aligns
quence part preceding the overlap, which originates from better to the affected region of the reference sequence.
the first region, the highest-scoring sequence at the over To mitigate this effect, we have implemented a sliding
lap, which may be from either of the 2, and the sequence window–based alignment trimming approach to rid the
part that follows the overlap, which originates from the se alignments from these unwanted regions. This works by
cond region. The algorithm then continues in the same scanning the alignment from left to right, cutting all regions
manner, comparing the topmost region of the stack with where the average distance between query and reference is
the following region, until all overlaps are removed and above the user-provided distance threshold. The window
all regions have been added to the stack. This procedure size and the distance threshold can both be set by the
may require multiple iterations, since in every run, only user, but need not be, since we implemented default values
each pair of consecutive regions are compared and for both. This step can also be skipped over in its entirety.
merged. This approach tries to avoid cases where a single bad, but
Different aligned regions from the same contig are al correct, match would have otherwise been cut out.
lowed to be stitched together. While “hit stitching” may re
sult in the creation of chimeric sequences (i.e. 2 or more Concatenation and Realignment of Remaining Regions
biological sequences incorrectly joined together), this pro Finally, the resulting set of ordered, nonoverlapping se
cedure has the potential to increase coverage and to quence regions are concatenated into 1 continuous

Genome Biol. Evol. 15(12) https://doi.org/10.1093/gbe/evad227 Advance Access publication 12 December 2023 9
Thalén et al. GBE

sequence. The concatenated sequence is then realigned to Program Comparison

the reference to obtain the final output sequence and align In order to generate data sets at different sequencing cov
ment score. erages, we subsampled the trimmed D. gyrociliatus reads
downloaded from NCBI GenBank. Corresponding read
Benchmark pairs were selected randomly from the paired-end data.
Subsampling was done using Subsample.jl, a Julia package
Patchwork v.0.5.1 was continuously run using Julia v.1.8.2

Downloaded from https://academic.oup.com/gbe/article/15/12/evad227/7470721 by UFRJ - CFCH/IFCS user on 12 September 2024

distributed together with Patchwork. The resulting data
and DIAMOND v.2.0.13 (Buchfink et al. 2021), with the sets have coverages of 1×, 3×, 5×, 10×, 20×, and 40×.
options --ultra-sensitive --frameshift 15 --masking 0. All ana For each of the data sets, we produced a short-read-only
lyses were performed on the high-performance computing de novo assembly, as ALiBaSeq is designed for assembly
cluster maintained by the Gesellschaft für wissenschaftliche data, while aTRAM 2 requires read data and Patchwork
Datenverarbeitung mbH Göttingen (GWDG), running the can process both. We used the assembler SPAdes
Scientific Linux release 7.9 (Nitrogen) operating system with v.3.15.3 (Nurk et al. 2013), with a K-mer size of 33, and
a Linux kernel of version 3.10.0. All runs were allocated 32 the quality of the assembly was assessed using QUAST
Intel Xeon Platinum 9242 CPUs running at 2.30 GHz. v.5.0.2 (Gurevich et al. 2013). We aligned the D. gyrocilia
Elapsed time was calculated as reported by Slurm. tus reads and assemblies against the same set of H. robusta
USCOs mentioned before.
Effect of Genome Fragmentation on Accuracy ALiBaSeq v.1.2 was run with the D. gyrociliatus assem
blies described above. The program requires BLAST; the
A publicly available set of Illumina short-read sequences
version here used was 2.11.0. The program builds a data
of D. gyrociliatus (Martín-Durán et al. 2021) was used
base from the D. gyrociliatus sequences and searches this
for the query set. We used SPAdes v.3.15.3 to generate
database with the H. robusta sequences before stitching
the de novo assembly, using a K-mer size of 55. A set
the hits together. We set the parameters according to
of 815 D. gyrociliatus USCOs retrieved from the published
the guide for a protein-based search without reciprocal
high-quality genome assembly (GenBank, accession:
GCA_904063045.1) served as the reference. search, as explained in their documentation on GitHub
(see the README file): -x a [extract all hits and join into
(super)contigs] -f S [single alignment table (TBLASTN result
Effect of Reference Divergence file)] -e 1e-10 [e-value cutoff for further processing of
We reused the de novo assembly from the previous evalu TBLASTN hits] -c 1 [extract single best (super)contig]
ation for the query, while a set of 957 near-USCOs from --amalgamate-hits [scoring scheme for (super)contigs] --is
the annotated genome of the leech H. robusta (GenBank, [enable contig stitching] –ac aa-tdna [search protein
accession: GCA_000326865.1) were used as reference se “baits” (H. robusta USCOs) against tDNA “target” data
quences. We used the same parameter settings described base (D. gyrociliatus reads)].
above. For this evaluation, we did not use Patchwork's We ran aTRAM v.2.4.3 with the sampled D. gyrociliatus
own accuracy and completeness assessment, because the read data sets. The program further requires BLAST, as well
true number of identical matches and the amount of query as a de novo assembler, and exonerate. We used BLAST
coverage are not known between the 2 divergent species v.2.11.0 and exonerate v.2.2.0 (Slater and Birney 2005)
D. gyrociliatus and H. robusta. We therefore chose to com and employed SPAdes v.3.15.3 for the assembly step. The
pare the recovered markers to a subset of the D. gyrociliatus full aTRAM 2 pipeline consists of 3 consecutive steps:
USCOs described in the previous benchmark. More specif Firstly, the preparation of a database from the D. gyrocilia
ically, only those D. gyrociliatus USCOs that produced a hit tus reads, secondly, the assembly of different loci, and last
when searching against the H. robusta USCOs with ly, a reference-guided stitching process. The parameter
DIAMOND v2.0.13 in ultrasensitive mode were used, since settings for the core module of aTRAM 2 as well as the
only these were considered “recoverable” in this setup. The stitcher were as follows: --evalue 1e-10 --file-filter “*.fil
resulting D. gyrociliatus USCO set contains 778 sequences; tered contigs.fasta” --overlap N.
37 sequences were discarded. The set of recovered markers Patchwork v.0.5.1 was run with both the sampled
was searched against the reference USCO set using D. gyrociliatus read data sets and the assemblies we pro
DIAMOND in --ultra-sensitive mode. For each reference se duced for these sampled read data. We ran the uncom
quence, we retrieved only the marker that produced the piled program using Julia v.1.8.2 as well as the compiled
highest bit score during the alignment step. We then eval version on each data set in order to perform runtime com
uated percent identical positions out of all aligned positions parisons. Patchwork achieves all its objectives in a 1-step
as well as percent of reference sequence positions covered procedure, i.e. can be called with a single command,
by the recovered markers. unlike ALiBaSeq and aTRAM 2. The program builds a

10 Genome Biol. Evol. 15(12) https://doi.org/10.1093/gbe/evad227 Advance Access publication 12 December 2023
Patchwork: Alignment-Based Retrieval and Concatenation GBE

DIAMOND database from the H. robusta sequences and, SRR5088472, SRR5088473, SRR1182279, SRR5308136,
after obtaining D. gyrociliatus hits, proceeds to stitch SRR5308138, SRR5088474, SRR5088475, SRR5308112,
them together. All nondefault parameter settings for and SRR5088466) using prefetch, vdb-validate, and fasterq-
Patchwork were as described above. They were used for dump (with the flag --split-spot), all from the NCBI SRA tool
both read-based runs and assembly-based runs. kit (Leinonen et al. 2011). We ran Patchwork v.0.5.1 with
We ran the 3 programs with their respective parameter each of the specimens as query input and a set of 957
settings on the different D. gyrociliatus data sets against near-USCOs from the leech H. robusta as reference se

Downloaded from https://academic.oup.com/gbe/article/15/12/evad227/7470721 by UFRJ - CFCH/IFCS user on 12 September 2024

the H. robusta USCO set described above, which contains quences (see Table 2 for parameter settings). A multiple se
957 sequences. The aTRAM 2 run for the 40× coverage quence alignment (MSA) was constructed for each of these
read data set was ended prematurely because it had not 957 loci using MAFFT (Katoh and Standley 2013) with the
terminated after 5 d. In a following step, the recovered mar options –globalpair --ep 0.123. The resulting alignments
kers produced by each program for each data set were eval were trimmed with trimAl (Capella-Gutiérrez et al. 2009),
uated with respect to completeness and accuracy of the removing all positions with more than 90% gaps but retain
resulting sequences by comparing them to the same set ing at least 60% of each alignment (options -gt 0.9 and
of 778 D. gyrociliatus USCOs mentioned above, again be -cons 60, respectively). We used FASconcat-G (Kück and
cause only this subset could be recovered by the programs Longo 2014) to concatenate the trimmed alignments into
in this setup. ALiBaSeq and aTRAM 2 output DNA se a supermatrix. This supermatrix was then input into
quences that contain the ambiguous nucleotide N in all po IQ-TREE 2 (Minh et al. 2020), alongside its corresponding
sitions that could not be recovered during stitching. These gene partition file, to reconstruct the phylogeny using the
N were removed for the subsequent evaluation steps be maximum likelihood (ML) approach. We ran IQ-TREE 2
cause they distort the query coverage measure; the amount with extended model selection and tree inference, calculat
of a reference sequence covered by the recovered marker is ing 1,000 replicates for the ultrafast bootstrap (command
artificially increased due to the uninformative inserted N. line options -m MFP and -B 1000, respectively).
Completeness and accuracy were measured jointly
as percent identical aligned positions multiplied with
the total amount of aligned, or recovered, positions (here
Acknowledgments
called pidentical, cov): This work used the Scientific Compute Cluster at GWDG,
the joint data center of Max Planck Society for the
nmatch Advancement of Science (MPG) and University of
pidentical, cov = · cov.
naligned Göttingen. We acknowledge support by the Open Access
Publication Funds of the Göttingen University.
􏽐
s (length(srecovered ))
cov = 􏽐recovered .
sUSCOs (length(sUSCOs )) Funding
This work was supported by the German Research
nmatch being the total number of exact matches in all align
Foundation (DFG) BL787/8-1.
ments between recovered markers and reference USCOs
and naligned the total number of aligned, i.e. nongap, posi
tions. The coverage cov was computed as the ratio of the Data Availability
total lengths of all recovered markers srecovered and all refer The data underlying this article are available via GitHub at
ence USCOs sUSCOs. We chose to combine the measures for https://github.com/animal-evolution-and-biodiversity/benc
accuracy of the recovered markers, i.e. percent identical out hmarking-patchwork. Patchwork is distributed under the
of all aligned positions, and completeness or query cover GPLv3 license via GitHub at https://github.com/fethalen/
age, i.e. percent recovered positions, in order to avoid a dis patchwork.
torted outcome. For example, a program might recover
only a very small number of markers but these with high
percent identity, such that using only the percent identity Literature Cited
measure would have resulted in an overestimation of the Allen JM, Boyd B, Nguyen NP, Vachaspati P, Warnow T, Huang DI,
program's performance. Grady PGS, Bell KC, Cronk QCB, Mugisha L, et al.
Phylogenomics from whole genome sequences using aTRAM.
Syst Biol. 2017:66(5):786–798. https://doi.org/10.1093/sysbio/
Patchwork in a Phylogenomic Context syw105.
Allen JM, LaFrance R, Folk RA, Johnson KP, Guralnick RP. aTRAM 2.0:
We retrieved the raw reads from the NCBI Sequence Read an improved, flexible locus assembler for NGS data. Evol
Archive (SRA accession SRR5088465, SRR5088468, Bioinform. 2018:14:1176934318774546. https://doi.org/10.
SRR5308129 SRR5308123, SRR5088469 SRR5088471, 1177/1176934318774546.

Genome Biol. Evol. 15(12) https://doi.org/10.1093/gbe/evad227 Advance Access publication 12 December 2023 11
Thalén et al. GBE

Bezanson J, Edelman A, Karpinski S, Shah VB. Julia: a fresh approach to Henikoff JG, Henikoff S. Blocks database and its applications. Meth
numerical computing. SIAM Rev. 2017:59(1):65–98. https://doi. Enzymol. 1996:266:88–105. https://doi.org/10.1016/S0076-
org/10.1137/141000671. 6879(96)66008-X.
Bleidorn C. Phylogenomics. An introduction. Cham: Springer Hu T, Chitnis N, Monos D, Dinh A. Next-generation sequencing tech
International Publishing; 2017. nologies: an overview. Human Immunol. 2021:82(11):801–811.
Blom MPK. Opportunities and challenges for high-quality bio https://doi.org/10.1016/j.humimm.2021.02.012.
diversity tissue archives in the age of long-read sequencing. Jin J-J, Yu W-B, Yang J-B, Song Y, dePamphilis CW, Yi T-S, Li D-Z.
Mol Ecol. 2021:30(23):5935–5948. https://doi.org/10.1111/ GetOrganelle: a fast and versatile toolkit for accurate de novo as

Downloaded from https://academic.oup.com/gbe/article/15/12/evad227/7470721 by UFRJ - CFCH/IFCS user on 12 September 2024

mec.15909. sembly of organelle genomes. Genome Biol. 2020:21(1):241.
Bragg JG, Potter S, Bi K, Moritz C. Exon capture phylogenomics: effi https://doi.org/10.1186/s13059-020-02154-5.
cacy across scales of divergence. Mol Ecol Res. 2016:16(5): Katoh K, Standley DM. MAFFT multiple sequence alignment software
1059–1068. https://doi.org/10.1111/1755-0998.12449. version 7: improvements in performance and usability. Mol Biol
Buchfink B, Reuter K, Drost H-G. Sensitive protein alignments at Evol. 2013:30(4):772–780. https://doi.org/10.1093/molbev/
tree-of-life scale using DIAMOND. Nature Meth. 2021:18(4): mst010.
366–368. https://doi.org/10.1038/s41592-021-01101-x. Keilwagen J, Hartung F, Paulini M, Twardziok SO, Grau J. Combining
Call E, Mayer C, Twort V, Dietz L, Wahlberg N. 2021. Museomics: phy RNA-seq data and homology-based gene prediction for plants, an
logenomics of the moth family Epicopeiidae (Lepidoptera) using imals and fungi. BMC Bioinformatics 2018:19(1):189. https://doi.
target enrichment. Insect Syst Divers. 5(2):6. https://doi.org/10. org/10.1186/s12859-018-2203-5.
1093/isd/ixaa021 Keilwagen J, Wenk M, Erickson JL, Schattat MH, Grau J, Hartung F.
Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. Trimal: a tool for Using intron position conservation for homology-based gene pre
automated alignment trimming in large-scale phylogenetic ana diction. Nucleic Acids Res. 2016:44(9):e89. https://doi.org/10.
lyses. Bioinformatics 2009:25(15):1972–1973. https://doi.org/10. 1093/nar/gkw092.
1093/bioinformatics/btp348. Knyshov A, Gordon ERL, Weirauch C. New alignment-based sequence
Dahn HA, Mountcastle J, Balacco J, Winkler S, Bista I, Schmitt AD, extraction software (ALiBaSeq) and its utility for deep level phylo
Pettersson OV, Formenti G, Oliver K, Smith M, et al. genetics. PeerJ 2021:9:e11019. https://doi.org/10.7717/peerj.
Benchmarking ultra-high molecular weight DNA preservation 11019.
methods for long-read and long-range sequencing. GigaScience Kück P, Longo GC. FASconCAT-G: extensive functions for multiple se
2022:11:giac068. https://doi.org/10.1093/gigascience/giac068. quence alignment preparations concerning phylogenetic studies.
da Veiga Leprevost F, Grüning BA, Alves Aflitos S, Röst HL, Uszkoreit J, Frontiers Zool. 2014:11(1):81. https://doi.org/10.1186/s12983-
Barsnes H, Perez-Riverol Y. BioContainers: an open-source and 014-0081-x.
community-driven framework for software standardization. Kurtzer GM, Sochat V, Bauer MW. Singularity: scientific containers for
Bioinformatics 2017:33(16):2580–2582. https://doi.org/10.1093/ mobility of compute. PLoS One 2017:12(5):e0177459. https://doi.
bioinformatics/btx192. org/10.1371/journal.pone.0177459.
Dietz L, Eberle J, Mayer C, Kukowka S, Bohacz C, Baur H, Espeland M, Leinonen R, Sugawara H, Shumway M. The sequence read archive.
Huber BA, Hutter C, Mengual X, et al. Standardized nuclear mar Nucleic Acids Res. 2011:39(Database):D19–D21. https://doi.org/
kers improve and homogenize species delimitation in Metazoa. 10.1093/nar/gkq1019.
Methods Ecol Evol. 2023:14(2):543–555. https://doi.org/10. Lemmon EM, Lemmon AR. High-throughput genomic data in systema
1111/2041-210X.14041. tics and phylogenetics. Annu Rev Ecol Evol Syst. 2013:44(1):
Dodsworth S. Genome skimming for next-generation biodiversity ana 99–121. https://doi.org/10.1146/annurev-ecolsys-110512-135822.
lysis. Trends Plant Sci. 2015:20(9):525–527. https://doi.org/10. Liu B-B, Liu B-B, Ma Z-Y, Ren C, Hodel RGJ. 2021. Capturing single-
1016/j.tplants.2015.06.012. copy nuclear genes, organellar genomes, and nuclear ribosomal
Dylus D, Altenhoff A, Majidian S, Sedlazeck FJ, Dessimoz C. Inference of DNA from deep genome skimming data for plant phylogenetics:
phylogenetic trees directly from raw sequencing reads using a case study in Vitaceae. Appl Plant Sci. 11(4):e11537. https://
Read2Tree. Nat Biotechnol. 2023. https://doi.org/10.1038/s41587- doi.org/10.1111/jse.12806
023-01753-4. Lozano-Fernandez J. A practical guide to design and assess a phyloge
Erséus C, Williams BW, Horn KM, Halanych KM, Santos SR, James SW, nomic study. Genome Biol Evol. 2022:14(9):evac129. https://doi.
Des Creuzé Châtelliers M, Anderson FE. Phylogenomic analyses re org/10.1093/gbe/evac129.
veal a Palaeozoic radiation and support a freshwater origin for cli Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO up
tellate annelids. Zool Scr. 2020:49(5):614–640. https://doi.org/10. date: novel and streamlined workflows along with broader and
1111/zsc.12426. deeper phylogenetic coverage for scoring of eukaryotic, prokaryot
Formenti G, Theissinger K, Fernandes C, Bista I, Bombarely A, ic, and viral genomes. Mol Biol Evol. 2021:38(10):4647–4654.
Bleidorn C, Ciofi C, Crottini A, Godoy JA, Höglund J, et al. The https://doi.org/10.1093/molbev/msab199.
era of reference genomes in conservation genomics. Trends Martín-Durán JM, Vellutini BC, Marlétaz F, Cetrangolo V, Cvetesic N,
Ecol Evol. 2022:37(3):197–202. https://doi.org/10.1016/j.tree. Thiel D, Henriet S, Grau-Bové X, Carrillo-Baltodano AM, Gu W,
2021.11.008. et al. Conservative route to genome compaction in a miniature an
Frith MC. A new repeat-masking method enables specific detection of nelid. Nat Ecol Evol. 2021:5(2):231–242. https://doi.org/10.1038/
homologous sequences. Nucleic Acids Res. 2011:39(4):e23. s41559-020-01327-6.
https://doi.org/10.1093/nar/gkq1212. McCormack JE, Hird SM, Zellmer AJ, Carstens BC, Brumfield RT.
Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment Applications of next-generation sequencing to phylogeography
tool for genome assemblies. Bioinformatics 2013:29(8): and phylogenetics. Mol Phylogenet Evol. 2013:66(2):526–538.
1072–1075. https://doi.org/10.1093/bioinformatics/btt086. https://doi.org/10.1016/j.ympev.2011.12.007.
Heath T, Hedtke SM, Hillis DM. 2008. Taxon sampling and the accuracy Merkel D. 2014. Docker: lightweight linux containers for consistent de
of phylogenetic analyses. J Syst Evol. 46:239–257. 10.3724/SP.J. velopment and deployment. Linux J. 239(2):2. 10.5555/2600239.
1002.2008.08016 2600241

12 Genome Biol. Evol. 15(12) https://doi.org/10.1093/gbe/evad227 Advance Access publication 12 December 2023
Patchwork: Alignment-Based Retrieval and Concatenation GBE

Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S,
Haeseler A, Lanfear R. IQ-TREE 2: new models and efficient methods Treangen TJ, Schatz MC, Delcher AL, Roberts M, et al. GAGE: a crit
for phylogenetic inference in the genomic era. Mol Biol Evol. ical evaluation of genome assemblies and assembly algorithms.
2020:37(5):1530–1534. https://doi.org/10.1093/molbev/msaa015. Genome Res. 2012:22(3):557–567. https://doi.org/10.1101/gr.
Nurk S, Bankevich A, Antipov D, Gurevich AA, Korobeynikov A, 131383.111.
Lapidus A, Prjibelski AD, Pyshkin A, Sirotkin A, Sirotkin Y, et al. Seppey M, Manni M, Zdobnov EM. BUSCO: assessing genome assem
Assembling single-cell genomes and mini-metagenomes from chi bly and annotation completeness. Methods Mol Biol. 2019:1962:
meric MDA products. J Comput Biol. 2013:20(10):714–737. 227–245. https://doi.org/10.1007/978-1-4939-9173-0_14.

Downloaded from https://academic.oup.com/gbe/article/15/12/evad227/7470721 by UFRJ - CFCH/IFCS user on 12 September 2024

https://doi.org/10.1089/cmb.2013.0084. Simmons SK, Lithwick-Yanai G, Adiconis X, Oberstrass F, Iremadze N,
Philippe H, de Vienne DM, Ranwez V, Roure B, Baurain D. 2017. Pitfalls Geiger-Schuller K, Thakore PI, Frangieh CJ, Barad O, Almogy G,
in supermatrix phylogenomics. Eur J Taxon. 283:1–25. 10.5852/ et al. Mostly natural sequencing-by-synthesis for scRNA-seq using
ejt.2017.283 Ultima sequencing. Nat Biotechnol. 2023:41(2):204–211. https://
Ranwez V, Douzery EJP, Cambon C, Chantret N, Delsuc F. MACSE v2: doi.org/10.1038/s41587-022-01452-6.
toolkit for the alignment of coding sequences accounting for fra Slater GSC, Birney E. Automated generation of heuristics for biological
meshifts and stop codons. Mol Biol Evol. 2018:35(10): sequence comparison. BMC Bioinformatics 2005:6(1):31. https://
2582–2584. https://doi.org/10.1093/molbev/msy159. doi.org/10.1186/1471-2105-6-31.
Raxworthy CJ, Smith BT. Mining museums for historical DNA: advances Stark R, Grzelak M, Hadfield J. RNA sequencing: the teenage years. Nat
and challenges in museomics. Trends Ecol Evol. 2021:36(11): Rev Genet. 2019:20(11):631–656. https://doi.org/10.1038/
1049–1060. https://doi.org/10.1016/j.tree.2021.07.009. s41576-019-0150-2.
Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Steenwyk JL, Li Y, Zhou X, Shen XX, Rokas A. Incongruence in the phy
Uliano-Silva M, Chow W, Fungtammasan A, Kim J, et al. logenomics era. Nat Rev Genet. 2023:24(12):834–850. https://doi.
Towards complete and error-free genome assemblies of all verte org/10.1038/s41576-023-00620-x.
brate species. Nature 2021:592(7856):737–746. https://doi.org/ Waterhouse RM, Seppey M, Simão FA, Manni M, Ioannidis P,
10.1038/s41586-021-03451-0. Klioutchnikov G, Kriventseva EV, Zdobnov EM. BUSCO applica
Richter S, Schwarz F, Hering L, Böggemann M, Bleidorn C. The utility of tions from quality assessments to gene prediction and phyloge
genome skimming for phylogenomic analyses as demonstrated for nomics. Mol Biol Evol. 2018:35(3):543–548. https://doi.org/10.
glycerid relationships (Annelida, Glyceridae). Genome Biol Evol. 1093/molbev/msx319.
2015:7(12):3443–3462. https://doi.org/10.1093/gbe/evv224. Whelan S, Irisarri I, Burki F. PREQUAL: detecting non-homologous
Rogozin IB, Sverdlov AV, Babenko VN, Koonin EV. Analysis of evolution characters in sets of unaligned homologous sequences.
of exon-intron structure of eukaryotic genes. Brief Bioinformatics Bioinformatics 2018:34(22):3929–3930. https://doi.org/10.1093/
2005:6(2):118–134. https://doi.org/10.1093/bib/6.2.118. bioinformatics/bty448.
Sahbou A-E, Iraqi D, Mentag R, Khayi S. BuscoPhylo: a webserver for Zhang F, Ding Y, Zhu C-D, Zhou X, Orr MC, Scheu S, Luan Y-X.
Busco-based phylogenomic analysis for non-specialists. Sci Rep. Phylogenomics from low-coverage whole-genome sequencing.
2022:12(1):17352. https://doi.org/10.1038/s41598-022-22461-0. Methods Ecol Evol. 2019:10(4):507–517. https://doi.org/10.
Salzberg SL. Next-generation genome annotation: we still struggle to 1111/2041-210X.13145.
get it right. Genome Biol. 2019:20(1):92. https://doi.org/10.
1186/s13059-019-1715-2. Associate editor: Dennis Lavrov

Genome Biol. Evol. 15(12) https://doi.org/10.1093/gbe/evad227 Advance Access publication 12 December 2023 13

Vimal Roll No 2211022 ANALYSIS TOOL. PHYLIPpptx
No ratings yet
Vimal Roll No 2211022 ANALYSIS TOOL. PHYLIPpptx
27 pages
Bioinformatics Assignment Topic: Phylogenetics Analysis Softwares
No ratings yet
Bioinformatics Assignment Topic: Phylogenetics Analysis Softwares
12 pages
Bioinformatic Workflows For Generating Complete Plastid Genome Sequences-An Example From Cabomba (Cabombaceae) in The Context of The Phylogenomic Analysis of The Water-Lily Clade
No ratings yet
Bioinformatic Workflows For Generating Complete Plastid Genome Sequences-An Example From Cabomba (Cabombaceae) in The Context of The Phylogenomic Analysis of The Water-Lily Clade
17 pages
Phylogenomics An Introduction Full Download
No ratings yet
Phylogenomics An Introduction Full Download
16 pages
EX2 Phylogenetic Tree
No ratings yet
EX2 Phylogenetic Tree
13 pages
Bioinformatics Protein Structure Metabolism
No ratings yet
Bioinformatics Protein Structure Metabolism
12 pages
Panache
No ratings yet
Panache
3 pages
Phylogentic Tree Construction - Tools
No ratings yet
Phylogentic Tree Construction - Tools
6 pages
Phylogeny Notes
No ratings yet
Phylogeny Notes
14 pages
Unit IV
No ratings yet
Unit IV
11 pages
Phylogenetic Analyses: Kirsi Kostamo
No ratings yet
Phylogenetic Analyses: Kirsi Kostamo
33 pages
Tomorrow PDF
No ratings yet
Tomorrow PDF
7 pages
Racon Medaka Homopolish
No ratings yet
Racon Medaka Homopolish
17 pages
A Graphical Tool To Generate Newick Strings From Phylogenetic Tree Images
No ratings yet
A Graphical Tool To Generate Newick Strings From Phylogenetic Tree Images
9 pages
2022 12 23 521809v1 Full
No ratings yet
2022 12 23 521809v1 Full
25 pages
63zhangetal.2019 phylogenomicsfromlow-coverageWGS
No ratings yet
63zhangetal.2019 phylogenomicsfromlow-coverageWGS
12 pages
Tree House Explorer A Novel Genome Browser For Phy
No ratings yet
Tree House Explorer A Novel Genome Browser For Phy
18 pages
IBB - MB.501 Mol. Phylogeny
No ratings yet
IBB - MB.501 Mol. Phylogeny
81 pages
NCBI Genome
No ratings yet
NCBI Genome
37 pages
ATLAS - A Snakemake Workflow For Assembly, Annotation, and Genomic Binning of Metagenome Sequence Data
No ratings yet
ATLAS - A Snakemake Workflow For Assembly, Annotation, and Genomic Binning of Metagenome Sequence Data
8 pages
00 Endterm Activity Building A Phylogenetic Tree
No ratings yet
00 Endterm Activity Building A Phylogenetic Tree
6 pages
Tree Thinking & Mesquite Exercise
No ratings yet
Tree Thinking & Mesquite Exercise
8 pages
Phyloscape: Interactive and Scalable Visualization Platform For Phylogenetic Trees
No ratings yet
Phyloscape: Interactive and Scalable Visualization Platform For Phylogenetic Trees
12 pages
Lecture-Metagenomics - Using Mothur
No ratings yet
Lecture-Metagenomics - Using Mothur
48 pages
Swami
No ratings yet
Swami
12 pages
Phylogenetics
No ratings yet
Phylogenetics
108 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
25 pages
Paper Stack A Traducir
No ratings yet
Paper Stack A Traducir
23 pages
Data Mining of Public Genomic Repositories: Harnessing Off-Target Reads To Expand Microbial Pathogen Genomic Resources
No ratings yet
Data Mining of Public Genomic Repositories: Harnessing Off-Target Reads To Expand Microbial Pathogen Genomic Resources
17 pages
MEGA Guide for Phylogenetic Trees
No ratings yet
MEGA Guide for Phylogenetic Trees
3 pages
GENESPACE Tracks Regions of Interest
No ratings yet
GENESPACE Tracks Regions of Interest
20 pages
The Salvia Miltiorrhiza Genome High-Resolution PDF Download
100% (10)
The Salvia Miltiorrhiza Genome High-Resolution PDF Download
15 pages
MoriartyLemmon2013ARES Phylogenetics
No ratings yet
MoriartyLemmon2013ARES Phylogenetics
26 pages
Understanding Phylogenies
No ratings yet
Understanding Phylogenies
6 pages
PHYLIP Phylogeny Inference Guide
No ratings yet
PHYLIP Phylogeny Inference Guide
17 pages
Genome Sequence Assembly Guide
No ratings yet
Genome Sequence Assembly Guide
92 pages
Phylogeny Activity: A Citizen Science Project
No ratings yet
Phylogeny Activity: A Citizen Science Project
12 pages
Metagenome Data Denoising Algorithm
No ratings yet
Metagenome Data Denoising Algorithm
16 pages
Phyl o Genetics
No ratings yet
Phyl o Genetics
58 pages
Phylogeny
No ratings yet
Phylogeny
21 pages
Assignment5 BI12-223
No ratings yet
Assignment5 BI12-223
9 pages
BLAST-EXPLORER Helps You Building Datasets For Phylogenetic Analysis
No ratings yet
BLAST-EXPLORER Helps You Building Datasets For Phylogenetic Analysis
6 pages
Bosque: Phylogenetic Analysis Software
No ratings yet
Bosque: Phylogenetic Analysis Software
39 pages
Installing and Using Phylogentics Software: Clustalx, Phylip, Treeview
No ratings yet
Installing and Using Phylogentics Software: Clustalx, Phylip, Treeview
26 pages
Msystems 00473-24
No ratings yet
Msystems 00473-24
12 pages
Comparative Genomics 1st Edition Inna Dubchak (Auth.) 2025 Instant Download
No ratings yet
Comparative Genomics 1st Edition Inna Dubchak (Auth.) 2025 Instant Download
144 pages
Comparative Genomics 1st Edition Inna Dubchak (Auth.) Instant Access 2025
100% (2)
Comparative Genomics 1st Edition Inna Dubchak (Auth.) Instant Access 2025
71 pages
A Short Guide To Phylogeny Reconstruction
No ratings yet
A Short Guide To Phylogeny Reconstruction
5 pages
AGR322 - Genomics
No ratings yet
AGR322 - Genomics
16 pages
Sisteamtika Filogenetik Melly
No ratings yet
Sisteamtika Filogenetik Melly
11 pages
Microbial Genome Sequencing Guide
No ratings yet
Microbial Genome Sequencing Guide
23 pages
Multiple Sequence Alignment Tools: Tutorials and Comparative Analysis
No ratings yet
Multiple Sequence Alignment Tools: Tutorials and Comparative Analysis
19 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
15 pages
Brief Bioinform-2010-Li-473-83
No ratings yet
Brief Bioinform-2010-Li-473-83
11 pages
Nanopore Sequencing
No ratings yet
Nanopore Sequencing
16 pages
Pyelph - A Software Tool For Gel Images Analysis and Phylogenetics
No ratings yet
Pyelph - A Software Tool For Gel Images Analysis and Phylogenetics
6 pages
Phylogenetic Analysis
No ratings yet
Phylogenetic Analysis
11 pages
Phylogenetic Tree
No ratings yet
Phylogenetic Tree
12 pages
Kuschel 2017
No ratings yet
Kuschel 2017
6 pages
Shin Et Al. - 2018 - Phylogenomic Data Yield New and Robust Insights Into The Phylogeny and Evolution of Weevils
No ratings yet
Shin Et Al. - 2018 - Phylogenomic Data Yield New and Robust Insights Into The Phylogeny and Evolution of Weevils
14 pages
Davis e Engel - A Zygopine Weevil in Early Miocene Amber From The
No ratings yet
Davis e Engel - A Zygopine Weevil in Early Miocene Amber From The
4 pages
Lyal Et Al. - 2006 - Morphology and Systematic Significance of Sclerole
No ratings yet
Lyal Et Al. - 2006 - Morphology and Systematic Significance of Sclerole
40 pages
McKenna Et Al. - 2009 - Temporal Lags and Overlap in The Diversification o
No ratings yet
McKenna Et Al. - 2009 - Temporal Lags and Overlap in The Diversification o
6 pages
Antiquity and Evolution of Prosternal Horns in Baridine Weevils (Coleoptera: Curculionidae)
No ratings yet
Antiquity and Evolution of Prosternal Horns in Baridine Weevils (Coleoptera: Curculionidae)
10 pages
Hespenheide - 2018 - A Review of Philenis Champion, 1906 (Coleoptera C
No ratings yet
Hespenheide - 2018 - A Review of Philenis Champion, 1906 (Coleoptera C
24 pages
Molecular and Morphological Phylogenetics of Weevils PDF
No ratings yet
Molecular and Morphological Phylogenetics of Weevils PDF
25 pages
LawrenceNewton1995 Families and Subfamilies of Coleopteraocr PDF
No ratings yet
LawrenceNewton1995 Families and Subfamilies of Coleopteraocr PDF
142 pages
Alonso-Zarazaga & Lyal, 1999 - World Catalogue (Searchable) PDF
No ratings yet
Alonso-Zarazaga & Lyal, 1999 - World Catalogue (Searchable) PDF
316 pages
Tomaszewska PhylogenyandclassificationofCucujoidea 2015
No ratings yet
Tomaszewska PhylogenyandclassificationofCucujoidea 2015
35 pages
P1 - B - 2014 - Misof - Et - Al - 2014 - Insect Phylogenomics - SI PDF
No ratings yet
P1 - B - 2014 - Misof - Et - Al - 2014 - Insect Phylogenomics - SI PDF
131 pages
Hunt2007 PDF
No ratings yet
Hunt2007 PDF
5 pages
LawrenceNewton1995 Families and Subfamilies of Coleopteraocr PDF
No ratings yet
LawrenceNewton1995 Families and Subfamilies of Coleopteraocr PDF
142 pages
New Pacholenus Species in Brazil
No ratings yet
New Pacholenus Species in Brazil
7 pages
SABIO-Reaction Kinetics Database
No ratings yet
SABIO-Reaction Kinetics Database
3 pages
Ch21. Genomes and Their Evolution - Campbell Biology 12th
No ratings yet
Ch21. Genomes and Their Evolution - Campbell Biology 12th
25 pages
Module 1 - Session 3 - Part 3
No ratings yet
Module 1 - Session 3 - Part 3
21 pages
Mitochondrial tRNA Gene Detection
No ratings yet
Mitochondrial tRNA Gene Detection
4 pages
DP-HLS: A High-Level Synthesis Framework For Accelerating Dynamic Programming Algorithms in Bioinformatics
No ratings yet
DP-HLS: A High-Level Synthesis Framework For Accelerating Dynamic Programming Algorithms in Bioinformatics
15 pages
Chapter 21 Genomes and Their Evolution
No ratings yet
Chapter 21 Genomes and Their Evolution
8 pages
Svietnam National University - Ho Chi Minh City International University
No ratings yet
Svietnam National University - Ho Chi Minh City International University
11 pages
Bioinformatics KSOU
No ratings yet
Bioinformatics KSOU
260 pages
Lecture-7-Dynamic Programming Global-Sequence Alignment
No ratings yet
Lecture-7-Dynamic Programming Global-Sequence Alignment
31 pages
Biology Grade 10 ST (MT) (BOOK)
No ratings yet
Biology Grade 10 ST (MT) (BOOK)
177 pages
2yrs Mca Sem3
No ratings yet
2yrs Mca Sem3
9 pages
Bioinformatics 34 22 3939
No ratings yet
Bioinformatics 34 22 3939
3 pages
Genome Database & Information System For Daphnia: @bio - Indiana.edu
No ratings yet
Genome Database & Information System For Daphnia: @bio - Indiana.edu
14 pages
DDBJ, Bilogical Data Bases, Bioinformatics Data Base
No ratings yet
DDBJ, Bilogical Data Bases, Bioinformatics Data Base
2 pages
DNA Sequencing - Comprehensive Notes
No ratings yet
DNA Sequencing - Comprehensive Notes
5 pages
Genomics & Proteomics Course Guide
No ratings yet
Genomics & Proteomics Course Guide
2 pages
OmicsLogic Python-Powered Molecular Modeling Workshop
No ratings yet
OmicsLogic Python-Powered Molecular Modeling Workshop
7 pages
Identification of Bacteria Using 16srRNA
No ratings yet
Identification of Bacteria Using 16srRNA
17 pages
NGS Bioinfo Cap
No ratings yet
NGS Bioinfo Cap
14 pages
DDBJ
No ratings yet
DDBJ
2 pages
Proteomics Databases and Websites: September 2012
No ratings yet
Proteomics Databases and Websites: September 2012
22 pages
LC-MSsim-A Simulation Software
No ratings yet
LC-MSsim-A Simulation Software
18 pages
M.SC Part II Syllabus
No ratings yet
M.SC Part II Syllabus
41 pages
The University of Arizona
No ratings yet
The University of Arizona
16 pages
Microbial Genomics
No ratings yet
Microbial Genomics
10 pages
Cladograms DOLPHIN COW SHARK
No ratings yet
Cladograms DOLPHIN COW SHARK
4 pages
Novogene America Sample Submission Guidelines 2024v1.1
No ratings yet
Novogene America Sample Submission Guidelines 2024v1.1
9 pages
Modern Industrial Microbiology and Biotechnology 2nd Edition Nduka Okafor Download
100% (1)
Modern Industrial Microbiology and Biotechnology 2nd Edition Nduka Okafor Download
63 pages
RWRF
No ratings yet
RWRF
8 pages
C Accts Easyweb Chettinad Chettinadadmin Homework 050722 12BGENMSG050722025610 PM
No ratings yet
C Accts Easyweb Chettinad Chettinadadmin Homework 050722 12BGENMSG050722025610 PM
5 pages