Lax 2018 Hemimastigophora Supra Kingdom
Lax 2018 Hemimastigophora Supra Kingdom
1038/s41586-018-0708-8
Almost all eukaryote life forms have now been placed within Gene sequence. The partial small subunit ribosomal RNA (SSU
one of five to eight supra-kingdom-level groups using molecular rRNA) gene sequence of strain BW2H has been deposited in GenBank,
phylogenetics1–4. The ‘phylum’ Hemimastigophora is probably accession code MF682191.
the most distinctive morphologically defined lineage that still Comments. Cells are larger and have several more flagella than
awaits such a phylogenetic assignment. First observed in the Hemimastix amphikineta, the only previously described species
nineteenth century, hemimastigotes are free-living predatory (14-μm by 7-μm cell body, 12 flagella per row6).
protists with two rows of flagella and a unique cell architecture5–7; Cells of H. kukwesjijk are oval in profile with a blunt anterior
to our knowledge, no molecular sequence data or cultures are projection (the capitulum) and two rows of flagella along their
currently available for this group. Here we report phylogenomic whole length (Fig. 1b, Extended Data Fig. 1). In cultivation as
analyses based on high-coverage, cultivation-independent strain BW2H, live cells were 16.5–20.5-μm long by 7–12.5-μm wide
transcriptomics that place Hemimastigophora outside of all (18.3 ± 1 μm × 9.9 ± 1.2 μm; n = 61), with a sub-central, rounded
established eukaryote supergroups. They instead comprise an nucleus and posterior contractile vacuole (Fig. 1c). Each row of 17–19
independent supra-kingdom-level lineage that most likely forms a flagella (mean 18.4; n = 25) lay in a channel between the two thick
sister clade to the ‘Diaphoretickes’ half of eukaryote diversity (that thecal plates. The anteriormost 9 or 10 flagella were closely spaced,
is, the ‘stramenopiles, alveolates and Rhizaria’ supergroup (Sar), and the rest emerged from separate notches in the underlying plate
Archaeplastida and Cryptista, as well as other major groups). The (Fig. 1b, e). The capitulum was bordered by the overlapping anterior
previous ranking of Hemimastigophora as a phylum understates the
evolutionary distinctiveness of this group, which has considerable
a b c d
importance for investigations into the deep-level evolutionary cap.
history of eukaryotic life—ranging from understanding the origins
of fundamental cell systems to placing the root of the tree. We have
also established the first culture of a hemimastigote (Hemimastix
kukwesjijk sp. nov.), which will facilitate future genomic and cell-
biological investigations into eukaryote evolution and the last
eukaryotic common ancestor.
We identified two previously undescribed species of the rarely
observed protist group Hemimastigophora (one Spironema and one
Hemimastix) in enrichments from soil. Here we formally describe the
newly identified Hemimastix species. e f
1
Department of Biology, Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada. 2Centre for Comparative Genomics and
Evolutionary Bioinformatics, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada. 3Present address: Department of Cell and Molecular Biology,
Science for Life Laboratory, Uppsala University, Uppsala, Sweden. 4These authors contributed equally: Gordon Lax, Yana Eglit, Laura Eme. *e-mail: [email protected]
N A t U r e | www.nature.com/nature
© 2018 Springer Nature Limited. All rights reserved.
RESEARCH Letter
Hemimastix
Freshwater
Clone AY689797 | Stream sediment Marine
Temperate
Brackish
Spironema Terrestrial
Fig. 2 | Environmental SSU rRNA/rDNA reads assigned to ‘?’) were of uncertain placement within Hemimastigophora—that is,
Hemimastigophora. The pplacer likelihood-to-weight ratio, habitat the likelihood-to-weight ratio for any single branch within the clade
and environmental zone are reported for each read (denoted by sample was < 0.5, but the sum of all likelihood-to-weight ratios = 1. See Extended
code in the left columns). Reads with a likelihood-to-weight ratio > 0.5 Data Fig. 3 for full reference tree. Supplementary Table 1 gives additional
are assigned to a branch. Five assigned sequences (denoted by a circled information on individual reads, including sample codes.
ends of the flagellar rows, and the adjacent plate margins housed ext- computationally intensive analyses). The transcriptomes proved to
rusomes (undischarged, Fig. 1f; discharged, Extended Data Fig. 2c). be high-coverage (Spironema, 290 of 351 = 82.6% of genes and 77.6%
Cells fed on a small stramenopile (Spumella sp.) after attachment at of sites represented; Hemimastix, 280 of 351 = 79.7% of genes, 72.1%
the capitulum, and enclosure by the anterior flagella (Fig. 1d, Extended of sites). Maximum likelihood analyses of both the 104-taxon and
Data Figs. 1h–k, 2a, b). 61-taxon datasets were consistent with other recent phylogenomic
The Spironema species awaits formal description; the cells we iso- studies3,9,10 in dividing previously known eukaryotes into three clans:
lated—which we discuss here as Spironema cf. multiciliatum (see Diaphoretickes, Discoba and an ‘Amorphea+’ assemblage (Fig. 3,
‘Identification of Spironema cf. multiciliatum’ in Methods)—were Extended Data Fig. 4). The major subgroups of Diaphoretickes were
spindle-shaped with a thin ‘tail’. These cells were 23–31-μm long by Sar plus Telonema, Haptophyta plus Centrohelida, and Cryptista
4–7.5-μm wide (mean ± s.d., 27.4 ± 3.5 μm × 5.4 ± 1.6 μm; n = 7), with plus Archaeplastida and Picozoa. The ‘Amorphea+’ group con-
an oval nucleus and two rows of six or more flagella clustered in the tained Obazoa and Amoebozoa, as well as collodictyonids, rigifilids,
anterior quarter, plus two or three flagella per row more posteriorly Mantamonas, Ancyromonadida, Malawimonadida and Metamonada.
(Fig. 1a, Extended Data Fig. 1a). The position of metamonads was unstable, which mirrors conflicts seen
We determined SSU rRNA sequences from both hemimastigotes, in other recent analyses9,11.
and used these to analyse published environmental sequence data- Spironema and Hemimastix formed a maximally supported
sets to determine (1) the distribution of the group across habitats and Hemimastigophora clade that was phylogenetically isolated. The 104-
(2) whether these sequences matched a known environmental clade. taxa analysis placed Hemimastigophora amongst the deepest branches
Unlike some other recently characterized lineages (for example, ref. 8), within Diaphoretickes, as the sister of a clade of Sar, Telonema, hap-
hemimastigotes do not appear to belong to a previously identified tophytes and centrohelids—though with equivocal support (ultra-
environmental clade. One unclassified long-read clone from freshwater fast bootstrap approximation = 83%; Fig. 4, Extended Data Fig. 4).
sediment (AY689797) was phylogenetically related to Spironema (Fig. 2, In the 61-taxa analysis, Hemimastigophora again grouped with
Extended Data Fig. 3). An additional 37 short reads were detected Diaphoretickes (bootstrap support = 100% (posterior mean site fre-
among V4 or V9 amplicon datasets, or soil metatranscriptomes quency method); ultra-fast bootstrap approximation = 93%; Bayesian
(Fig. 2, Supplementary Table 1). Many of the V4 and V9 amplicons posterior probability = 1), but actually branched sister to all of the other
derived from soil or freshwater, consistent with most light microscopy Diaphoretickes, which formed a clade (bootstrap support = 88%; ultra-
accounts7. However, nearly half of these amplicons came from marine fast bootstrap approximation = 60%; Bayesian posterior probability = 1;
sediment or water-column samples (Fig. 2), and one Hemimastix-like Fig. 3).
V4 amplicon was among the 25 most-abundant operational taxonomic To further explore the position of Hemimastigophora, we analysed
units in a fjord sediment dataset (Supplementary Table 1). several derivatives of the 61-taxon dataset that excluded potential
To place hemimastigotes in the tree of eukaryotes, we generated sources of phylogenetic inaccuracy. Analyses that (i) excluded the
transcriptomes from isolated single cells of both Spironema and three taxa identified as outlier long-branches (dataset referred to
Hemimastix, and assembled 351-gene datasets with a broad sampling as ‘58-nLB’), (ii) excluded the three data-poorest taxa (site cover-
of eukaryote taxa (initially 104 taxa; this was reduced to 61 taxa for age < 30%; dataset referred to as ‘58-nDP’) or (iii) recoded the amino
N A t U r e | www.nature.com/nature
© 2018 Springer Nature Limited. All rights reserved.
Letter RESEARCH
Toxoplasma
Vitrella
Karenia
Protocruzia
99 Tetrahymena
94 Ectocarpus
1 Phytophthora Sar
82 Schizochytrium
82 Stramenopile MMETSP1104
0.91 Bigelowiella
Paulinella
96 Gromia
83 Telonema
1 Pleurochrysis
Prymnesium Haptophyta
Diaphoretickes
87/90/0.92 Pavlovales CCMP2436
88
Acanthocystis
60 Choanocystis Centrohelida
1 Arabidopsis
Physcomitrella
Micromonas
Volvox
100 Cyanoptyche Archaeplastida
93 Gloeochaete
1 97/92/1 Chondrus
Porphyridium
97/99/1 Galdieria
Picozoa PB58411a
100/99/1 Guillardia
Goniomonas
Roombia Cryptista
Palpitomonas
Spironema
Hemimastix Hemimastigophora
Bodo
Diplonema
Eutreptiella
Naegleria
Pharyngomonas Discoba
93/89/1
Tsukubamonas
Reclinomonas
Andalucia
Monosiga
Homo
Amoebidium
Capsaspora Obazoa
Fonticula
Spizellomyces
Thecamonas
Amorphea+
Mastigamoeba
Physarum
Acanthamoeba Amoebozoa
99 Vannella
61 Rigifila
1 Diphylleia CRuMs
Mantamonas
Malawimonas
Gefionella Malawimonadida
100
Fabomonas
97 Nutomonas Ancyromonadida
0.99 81/53/NA
Trimastix Metamonada
0.1
Fig. 3 | Phylogenetic placement of Hemimastigophora within posterior probabilities (under the ‘CAT + GTR’ model). Filled circles
eukaryotes. Unrooted phylogeny inferred from 351 genes and 61 taxa, denote maximum support with all methods (that is, 100, 100 and 1,
using maximum likelihood under the ‘LG + C60 + F + Γ’ model. respectively). The three longest branches (leading to Bodo, Diplonema and
The numbers on branches show—in order from top to bottom or Tetrahymena) are shown reduced by 1/3. CruMs: collodictyonids, rigifilids
from left to right—posterior mean site frequency (PMSF) bootstrap and Mantamonas. NA, not available. Scale bar denotes 0.1 expected
percentages (bootstrap support; 200 true bootstrap replicates), ultrafast substitutions per site.
bootstrap approximation percentages (1,000 replicates) and Bayesian
acid data into four categories (dataset referred to as ‘61-SR4’) all sup- that they represent a novel, supra-kingdom-level lineage. This identifies
ported the same topology as the original 61-taxa analysis—that is, hemimastigotes as a crucial group to include in descriptions of the
Hemimastigophora outside of and sister to Diaphoretickes (Fig. 4, tree of eukaryote life, and in most studies of the evolution of eukar-
Extended Data Figs. 5–7). However, removing fast-evolving sites yotic cells. This is especially important when inferring the history of
did not systematically favour the tree inferred in the 61-taxa analy- eukaryotic innovations, or the nature of the last eukaryotic common
sis over a topology in which Hemimastigophora is sister to a Sar + ancestor, from the distributions across supergroups of particular genes,
Telonema + Haptophyta + Centrohelida clade (as in the 104-taxa genome characteristics or cellular features15–19. Hemimastigotes may
analysis; Extended Data Fig. 8). Thus, although most analyses place be equally important in the immensely challenging task of placing
Hemimastigophora as branching outside other Diaphoretickes, the the root of the eukaryote tree. The root is usually inferred20–23 to lie
alternative position—in which hemimastigotes fall one node inside somewhere between the largest eukaryote clans—approximately
Diaphoretickes—remains credible (Fig. 4). in one of the positions marked a, b or c in Fig. 4—with position a
All previous proposals for the phylogenetic or systematic place- (between Amorphea, and Diaphoretickes plus Discoba) currently
ment of Hemimastigophora were based on morphology alone. The being the most favoured22,23. Hemimastigophora appears to lie close
sub-membranous thecal plates between the two rows of flagella sug- to all of these positions on the unrooted tree (see Fig. 4), and could be
gested an affinity with euglenids, which have a pellicle6,7. Subsequently, our only known representative of one of the most ancient divisions
affinities were proposed with completely different taxa that have pelli- amongst extant eukaryotes. Accordingly, we searched the single-cell
cular or thecal structures—namely alveolates12, or apusomonads and transcriptomes for genes that could have arisen during the divergences
ancyromonads13. A placement within Rhizaria was also suggested on between supergroups (Fig. 4, Supplementary Table 3). We found sev-
the basis of flagellum and extrusome substructure14. None of these eral genes in hemimastigotes that are not known from Diaphoretickes,
proposals is supported by our phylogenies, because Hemimastigophora including those for myosin II—previously known from Amorphea,
is always distantly related to euglenids (Euglenozoa, in Discoba), apu- and one subgroup of Discoba18,24—and Golgi protein GCP16 (also
somonads and ancyromonads (both in Amorphea+) and Sar (which known as golgin A7) (previously specific to Amorphea)19. The pres-
contains Alveolata and Rhizaria). ence of such genes in hemimastigotes either pushes back their likely
Instead, the extremely deep phylogenetic position of origins to before the last eukaryotic common ancestor (or supports
Hemimastigophora—most likely at the base of Diaphoretickes—implies this inference) or—more controversially—could be due to the root of
N A t U r e | www.nature.com/nature
© 2018 Springer Nature Limited. All rights reserved.
RESEARCH Letter
P1 XIX
yo X
yo II
G in X
yo I
M in I
M in I
M in I
ZF 6
S2 1
AP 1
H σ
s
-
PL
5-
PS
yo
P
C
M
104-taxa 83 UFB
Alveolata
BS/UFB Sar + Telonemia Stramenopila
61-taxa 88/60
58-nLB 96/65 Rhizaria
Diaphoretickes
58-nDP 91/63
61-SR4 95 Haptophyta + Centrohelida
Hemimastigophora
(alternate topology)
Archaeplastida + Picozoa
c? Cryptista
Hemimastigophora
Discoba
b
Obazoa
Amorphea+ Amoebozoa
a Metamonada
Fig. 4 | Summary of phylogenomic analyses and distribution of analyses equivocated between these alternatives (see Extended Data
select genes across eukaryotes. Left, inferred phylogenetic positions Fig. 8). Labels a, b and c show the possible positions of the eukaryote root;
of Hemimastigophora. Box with solid outline details the support for the likely placement of Hemimastigophora results in several variants of
Hemimastigophora as a deep branch relative to the ‘Diaphoretickes’ position c. Right, known distributions of selected proteins encoded by
supergroup, in various analyses. BS, PMSF bootstrap support (except genes with proposed deep origins among living eukaryotes that were
61-SR4 for which the ‘GTR + R6 + F’ model was used). UFB, ultrafast detected in hemimastigote transcriptomes. Boxes filled in their top half
bootstrap approximation support. Dashed box shows support for the denote genes detected in Spironema; boxes filled in their bottom half
alternative topology (Hemimastigophora as a deep branch within denote genes detected in Hemimastix; completely filled boxes represent
Diaphoretickes) in the 104-taxa analysis. Stepwise fast-site removal genes detected in both organisms; see Supplementary Table 3 for details.
eukaryotes being further from the base of Amorphea than generally Received: 26 October 2017; Accepted: 21 September 2018;
supposed23—that is, Amorphea and Hemimastigophora being on the Published online xx xx xxxx.
same side of the root (shown by the top variant of position c in Fig. 4).
However, another hemimastigote myosin-family gene was previously 1. Burki, F. The eukaryotic tree of life from a global phylogenomic perspective.
unknown outside the Sar clade (Fig. 4): irrespective of the final position Cold Spring Harb. Perspect. Biol. 6, a016147 (2014).
of the root, this survey demonstrates that the antiquity of gene origins 2. Worden, A. Z. et al. Rethinking the marine carbon cycle: factoring in the
multifarious lifestyles of microbes. Science 347, 1257594 (2015).
tends to be underestimated until all major lineages are considered. This 3. Burki, F. et al. Untangling the early diversification of eukaryotes: a phylogenomic
bias can result in the underestimation of the gene content of ancient study of the evolutionary origins of Centrohelida, Haptophyta and Cryptista.
eukaryotes, and thus overestimations of the simplicity of their cell biol- Proc. R. Soc. Lond. B 283, 20152802 (2016).
4. Simpson, A. G. B. & Eglit, Y. in Encyclopedia of Evolutionary Biology Vol. 3 (ed.
ogy. Examining hemimastigote genomes—and ultimately their cell Kliman, R. M.) 344–360 (Elsevier, Amsterdam, 2016).
biology—will be valuable for better understanding eukaryote evolution 5. Klebs, G. Flagellatenstudien (Akademische Verlags-Gesellschaft, Leipzig, 1893).
at the deepest levels. 6. Foissner, W., Blatterer, H. & Foissner, I. The Hemimastigophora (Hemimastix
amphikineta nov. gen., nov. spec.), a new protistan phylum from Gondwanian
This study has used single-cell transcriptomics to unveil a soils. Eur. J. Protistol. 23, 361–383 (1988).
deep-branching eukaryote lineage. Single-cell transcriptomics and 7. Foissner, I. & Foissner, W. Revision of the family Spironemidae Doflein (Protista,
genomics25–27 bypass the ‘culture bottleneck’ and thus provide a rapid Hemimastigophora), with description of two new species, Spironema terricola
n. sp. and Stereonema geiseri n. g., n. sp. J. Eukaryot. Microbiol. 40, 422–438
path to deeper taxon sampling, even when species from a group of (1993).
interest are eventually cultivated. This is particularly valuable for phy- 8. Yubuki, N. et al. Morphological identities of two different marine stramenopile
logenomics, in which inaccuracy owing to poor taxon sampling is a environmental sequence clades: Bicosoeca kenaiensis (Hilliard, 1971) and
perpetual concern28. For this application, single-cell transcriptomics Cantina marsupialis (Larsen and Patterson, 1990) gen. nov., comb. nov.
J. Eukaryot. Microbiol. 62, 532–542 (2015).
outperforms single-cell genomics because of better coverage of house- 9. Brown, M. W. et al. Phylogenomics demonstrates that breviate flagellates are
keeping genes (see, for example, refs 26,27). Information on multiple related to opisthokonts and apusomonads. Proc. R. Soc. Lond. B 280, 20131755
related species is also valuable for ensuring data fidelity (detecting con- (2013).
10. Zhao, S. et al. Collodictyon—an ancient lineage in the tree of eukaryotes. Mol.
taminants, gene transfers and so on; see Methods). Single-cell techniques Biol. Evol. 29, 1557–1568 (2012).
are especially promising for the heterotrophic protozoa that probably 11. Cavalier-Smith, T. et al. Multigene eukaryote phylogeny reveals the likely
represent most ‘undiscovered’ major lineages, and for which establishing protozoan ancestors of opisthokonts (animals, fungi, choanozoans) and
Amoebozoa. Mol. Phylogenet. Evol. 81, 71–85 (2014).
cultures with suitable prey or hosts can be challenging25,27,29,30. 12. Cavalier-Smith, T. A revised six-kingdom system of life. Biol. Rev. Camb. Philos.
In this molecular phylogenetic investigation of Hemimastigophora, Soc. 73, 203–266 (1998).
we show that they are a previously unrecognized supergroup of eukar- 13. Cavalier-Smith, T. in The Flagellates, The Systematics Association Special Volume
Series 59 (eds Leadbeater, B. S. C. & Green, J. C.) 361–390 (Taylor & Francis,
yotes. Their phylogenetic distinctiveness is comparable to the whole London, 2000).
animal plus fungi clade (Opisthokonta) or the assemblage containing 14. Cavalier-Smith, T., Lewis, R., Chao, E. E., Oates, B. & Bass, D. Morphology and
all land plants and primary algae (Archaeplastida). We expect the dis- phylogeny of Sainouron acronematica sp. n. and the ultrastructural unity of
Cercozoa. Protist 159, 591–620 (2008).
covery or recognition of other important lineages will greatly accelerate 15. Speijer, D., Lukeš, J. & Eliáš, M. Sex is a ubiquitous, ancient, and inherent
owing to similar applications of single-cell methods. attribute of eukaryotic life. Proc. Natl Acad. Sci. USA 112, 8827–8834 (2015).
16. de Mendoza, A. et al. Transcription factor evolution in eukaryotes and the
Online content assembly of the regulatory toolkit in multicellular lineages. Proc. Natl Acad. Sci.
USA 110, E4858–E4866 (2013).
Any methods, additional references, Nature Research reporting summaries, source 17. Fukasawa, Y., Oda, T., Tomii, K. & Imai, K. Origin and evolutionary alteration of the
data, statements of data availability and associated accession codes are available at mitochondrial import system in eukaryotic lineages. Mol. Biol. Evol. 34,
https://doi.org/10.1038/s41586-018-0708-8. 1574–1586 (2017).
N A t U r e | www.nature.com/nature
© 2018 Springer Nature Limited. All rights reserved.
Letter RESEARCH
18. Sebé-Pedrós, A., Grau-Bové, X., Richards, T. A. & Ruiz-Trillo, I. Evolution and University) for Illumina sequencing, S. Geisen (Wageningen University) for
classification of myosins, a paneukaryotic whole-genome approach. Genome providing parsed metatranscriptomic data, F. Mahé (CIRAD, Montpellier) for
Biol. Evol. 6, 290–305 (2014). access to and parsing much of the V4 data, M. Brown (Mississippi State)
19. Barlow, L. D., Nývltová, E., Aguilar, M., Tachezy, J. & Dacks, J. B. A sophisticated, for the seed phylogenomic dataset, A. Sebé-Pedrós (Weizmann Institute of
differentiated Golgi in the ancestor of eukaryotes. BMC Biol. 16, 27 (2018). Science) for the seed myosin alignments, M. Kolisko (Institute of Parasitology,
20. He, D. et al. An alternative root for the eukaryote tree of life. Curr. Biol. 24, Czech Academy of Sciences) for data handling scripts, B. Q. Minh (University of
465–470 (2014). Vienna) for substantial help with phylogenomic analyses and troubleshooting
21. Katz, L. A., Grant, J. R., Parfrey, L. W. & Burleigh, J. G. Turning the crown upside in IQ-TREE, and R. Lewis (Nova Scotia Museum) and B. Francis for advice on
down: gene tree parsimony roots the eukaryotic tree of life. Syst. Biol. 61, Mi’kmaq tradition and language. This work was supported by CIFAR, NSERC
653–660 (2012). grant 298366-2014 to A.G.B.S. and NSERC grant 2016-016792 to A.J.R.
22. Derelle, R. & Lang, B. F. Rooting the eukaryotic tree with mitochondrial and
bacterial proteins. Mol. Biol. Evol. 29, 1277–1289 (2012). Reviewer information Nature thanks I. Ruiz-Trillo and the other anonymous
23. Derelle, R. et al. Bacterial proteins pinpoint a single eukaryotic root. Proc. Natl reviewer(s) for their contribution to the peer review of this work.
Acad. Sci. USA 112, E693–E699 (2015).
24. Richards, T. A. & Cavalier-Smith, T. Myosin domain evolution and the primary Author contributions Y.E. isolated the organisms and cultivated H. kukwesjijk.
divergence of eukaryotes. Nature 436, 1113–1118 (2005). Y.E. and G.L. undertook the microscopy. G.L. performed the single-cell
25. Kolisko, M., Boscaro, V., Burki, F., Lynn, D. H. & Keeling, P. J. Single-cell transcriptomics. Y.E., G.L. and E.M.B. analysed the rDNA and environmental
transcriptomics for microbial eukaryotes. Curr. Biol. 24, R1081–R1082 (2014). sequence data. G.L., L.E., Y.E. and A.G.B.S. assembled the phylogenomic
26. Yoon, H. S. et al. Single-cell genomics reveals organismal interactions in datasets. G.L., L.E. and A.J.R. performed phylogenomic analyses. L.E. and
uncultivated marine protists. Science 332, 714–717 (2011). Y.E. performed the gene presence analyses. G.L., Y.E. and A.G.B.S. wrote the
27. Gawryluk, R. M. R. et al. Morphological identification and single-cell genomics of manuscript, with input from all co-authors.
marine diplonemids. Curr. Biol. 26, 3053–3059 (2016).
28. Keeling, P. J. et al. The Marine Microbial Eukaryote Transcriptome Sequencing Competing interests The authors declare no competing interests.
Project (MMETSP): illuminating the functional diversity of eukaryotic life in the
Additional information
oceans through transcriptome sequencing. PLoS Biol. 12, e1001889 (2014).
Extended data is available for this paper at https://doi.org/10.1038/s41586-
29. Caron, D. A. et al. Probing the evolution, ecology and physiology of marine
018-0708-8.
protists using transcriptomics. Nat. Rev. Microbiol. 15, 6–20 (2017).
Supplementary information is available for this paper at https://doi.org/
30. Krabberød, A. K. et al. Single cell transcriptomics, mega-phylogeny, and the
10.1038/s41586-018-0708-8.
genetic basis of morphological innovations in Rhizaria. Mol. Biol. Evol. 34,
Reprints and permissions information is available at http://www.nature.com/
1557–1573 (2017).
reprints.
Correspondence and requests for materials should be addressed to A.G.B.S.
Acknowledgements The authors thank P. Li and P. Scallion (Dalhousie Publisher’s note: Springer Nature remains neutral with regard to jurisdictional
University) for assistance with electron microscopy, M. Dlutek (Dalhousie claims in published maps and institutional affiliations.
N A t U r e | www.nature.com/nature
© 2018 Springer Nature Limited. All rights reserved.
RESEARCH Letter
Several further sets of analyses were conducted on derivatives of the 61-taxon Paramastix (globular) or Stereonema (elongate but main rows of flagella about
dataset. First, we used a custom script to calculate average tip-to-tip distances half the length of the cell7,58,59). There are three previously described species of
for each taxon and identify ‘long-branching’ outliers (that is, taxa for which the Spironema: Spironema terricola, Spironema goodeyi and Spironema multiciliatum.
average tip-to-tip branch lengths were longer than three standard deviations from The shape and size of our specimens is inconsistent with S. terricola and S. goodeyi,
the centre of the distribution of average branch lengths). Removing the three iden- both of which are very long and thin7. In addition, neither of these species has any
tified outliers (Bodo, Diplonema and Tetrahymena) yielded the ‘58 taxa, no long- posterior flagella. Our cells are similar in shape to S. multiciliatum5. The number
est branches’ (58-nLB) dataset. This was analysed using maximum likelihood, as of flagella in the ‘main row’ and the presence of a few difficult-to-observe flagella
per the main 61-taxa analysis (IQ-TREE with LG + C60 + F + Γ, with 1,000- towards the posterior end are also broadly consistent with a previous account of
replicate ultra-fast bootstrap approximation, and 200 bootstraps using PMSF with S. multiciliatum, in which such posterior flagella were seen in some cells5,7.
the LG + C60 + F + Γ maximum likelihood tree for the 58-nLB dataset as the However, our cells are 23–31 μm in length (mean: 27.4 μm (s.d., 3.45 μm); n = 7;
guide tree). see main text), which is markedly longer than the 18-μm length reported for
Second, we deleted the three most data-poor taxa, each of which had site cov- S. multiciliatum. Thus, we determined that our specimens are similar—but not
erage < 30% (Telonema, Gromia and the picozoan PB58411a), resulting in a ‘58 identical—to S. multiciliatum.
taxa, no data-poor species’ (58-nDP) dataset. This was analysed using maximum Reporting summary. Further information on research design is available in
likelihood as per the main 61-taxa analysis, except that the PMSF bootstrap analysis the Nature Research Reporting Summary linked to this paper.
was based on 100 replicates.
Third, we recoded the main 61-taxon dataset into four distinct categories of Data availability
amino acids (SR4 scheme54), to address possible compositional heterogeneity. Raw reads of Spironema and Hemimastix transcriptomes are deposited in GenBank
The resulting 61-SR4 dataset was analysed with IQ-TREE under a GTR + R6 + F under accession codes SRR6032743 and SRR6032744, respectively. The assem-
model, with 500 real bootstrap replicates. bled Hemimastix and Spironema transcriptomes, 351 individual-gene alignments
Fourth, we used the assignment of per-site rates in IQ-TREE (-wsr flag) for the (104 taxa), concatenated and trimmed alignments and tree-files for the 104-taxon,
main 61-taxon dataset, and progressively removed the fastest-evolving sites in 10 61-taxon, 58-nLB, 58-nDP, 61-SR4 and 61-SFSR datasets, alignments and tree
steps, with approximately 4% of the sites removed in each step. This yielded 10 files for non-universal ancient genes, raw light microscopy and scanning electron
‘stepwise fastest sites removed’ (61-SFSR) datasets. To exclude the influence of the microscopy images, and the SSU rDNA alignment and tree-files have been depos-
position of Hemimastigophora in the guide trees for subsequent PMSF analyses, ited in Dryad (https://doi.org/10.5061/dryad.n5g39d7). The partial SSU rDNA
we deleted the two hemimastigotes from the full dataset and the 10 SFSR data- gene sequence of H. kukwesjijk strain BW2H is deposited in GenBank, under
sets (that is, 11 total) with phyx version 0.155, and pruned these two species from accession code MF682191. This publication has been registered with the ZooBank
the maximum likelihood tree from the 61-taxon dataset. The pruned tree was database (http://zoobank.org/) with the Life Science Identifier urn:lsid:zoobank.
then used as the guide tree to calculate PMSF profiles (‘PMSF-nHEMI’) under org:pub:4BA2A83C-8363-4EBE-A9C7-097CA470F9FB, and the name Hemimastix
LG + C60 + F + Γ. For each of the original 11 datasets (that is, datasets that kukwesjijk has been deposited in Zoobank with the Life Science Identifier urn:l-
included hemimastigotes), we then inferred support for important bipartitions sid:zoobank.org:act:32E12332-A418-40E2-BF4C-F2BFD94BF4CF.
under this LG + C60 + F + Γ PMSF model using a 1,000-replicate, ultra-fast
bootstrap approximation, and plotted these support values against the percentage
31. Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat.
of sites remaining (Extended Data Fig. 8). This method of generating the PMSF
Protoc. 9, 171–181 (2014).
model (PMSF-nHEMI) and evaluating statistical support differs from the main 32. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and
analyses (for example, 61-taxon, 58-nLB or 58-nDP), and the support values cannot high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
be directly compared between these analyses and the 61-SFSR analyses. 33. Castresana, J. Selection of conserved blocks from multiple alignments
Identification of non-universal ancient genes. To search the hemimastigote for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552
(2000).
transcriptome data for gene innovations that potentially originated early in the
34. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and
evolution of crown eukaryotes (and thus may also represent synapomorphies that post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
provide information about the relationships between supergroups), we collated a 35. Huse, S. M. et al. VAMPS: a website for visualization and analysis of microbial
set of gene systems reported in the literature to include genes with widespread—but population structures. BMC Bioinformatics 15, 41 (2014).
not universal—distributions across major eukaryote groups. Specific genes were 36. de Vargas, C. et al. Eukaryotic plankton diversity in the sunlit ocean. Science
selected on the basis of their presence in more than one species-rich ‘supergroup’ 348, 1261605 (2015).
37. BioMarKs Consortium. BioMarKs data portal http://www.biomarks.eu (2011).
of eukaryotes—for example both Obazoa and Amoebozoa (see Supplementary 38. Mahé, F. et al. Parasites dominate hyperdiverse soil protist communities in
Table 3). For this purpose, Metamonada and Discoba were considered distinct Neotropical rainforests. Nat. Ecol. Evol. 1, 0091 (2017).
supergroups. Sequences were retrieved from GenBank or from the literature, and 39. Marquardt, M., Vader, A., Stübner, E. I., Reigstad, M. & Gabrielsen, T. M.
used as BLASTp queries against both hemimastigote transcriptomes, translated into Strong seasonality of marine microbial eukaryotes in a high-arctic fjord
amino acid sequences using a custom script (default genetic code). Where genes (Isfjorden, in West Spitsbergen, Norway). Appl. Environ. Microbiol. 82,
1868–1880 (2016).
were not identified with BLASTp, hidden Markov model profiles were obtained
40. Geisen, S. et al. Metatranscriptomic census of active protists in soils. ISME J. 9,
either from the PFAM database or the literature (as indicated in Supplementary 2178–2190 (2015).
Table 3), or were built de novo from the alignments in the corresponding literature 41. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local
using hmmbuild, and then scanned for in both hemimastigote transcriptomes alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
using hmmscan (both hmmbuild and hmmscan from the Hmmer-3.1b2 pack- 42. Berger, S. A. & Stamatakis, A. Aligning short reads to reference alignments and
age56). Genes that were retrieved in only one of the hemimastigote transcriptomes trees. Bioinformatics 27, 2068–2075 (2011).
43. Matsen, F. A., Kodner, R. B. & Armbrust, E. V. pplacer: linear time maximum-
were used as BLASTp queries against the other. Hemimastigote candidate ortho- likelihood and Bayesian phylogenetic placement of sequences onto a fixed
logues were verified by reciprocal BLASTp against the nr database, and—where reference tree. BMC Bioinformatics 11, 538 (2010).
appropriate—domain annotation databases (InterProScan and SMART), and 44. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for
then added to pre-existing alignments from corresponding references (as shown Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
in Supplementary Table 3) via profile alignment using MUSCLE in Seaview version 45. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-seq data
without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
4.632,57. Where phylogenies were necessary to further confirm identity (particularly 46. Brown, M. W. et al. Phylogenomics places orphan protistan lineages in a novel
in the case of multigene families), the alignments were trimmed using BMGE eukaryotic super-group. Genome Biol. Evol. 10, 427–433 (2018).
version 1.148 (-m BLOSUM30), and phylogenies estimated in IQ-TREE version 47. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software
1.5.549 under the LG4X model. An alignment for HPS1 was not available in the version 7: improvements in performance and usability. Mol. Biol. Evol. 30,
original publication and was instead assembled from sequences from GenBank and 772–780 (2013).
48. Criscuolo, A. & Gribaldo, S. BMGE (Block Mapping and Gathering with Entropy):
publicly available transcriptomes, and aligned via MAFFT-L-INS-i47. Because of
a new software for selection of phylogenetic informative regions from multiple
the large size of the myosin gene family and the level of divergence between various sequence alignments. BMC Evol. Biol. 10, 210 (2010).
paralogues, myosin homologues were instead aligned with MAFFT-E-INS-i and 49. Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and
trimmed less conservatively (BMGE; -m BLOSUM30 -b 2), with the corresponding effective stochastic algorithm for estimating maximum-likelihood phylogenies.
phylogeny estimated under the LG + C60 + F + Γ model. Mol. Biol. Evol. 32, 268–274 (2015).
Identification of Spironema cf. multiciliatum. The cells we discuss as Spironema 50. Minh, B. Q., Nguyen, M. A. T. & von Haeseler, A. Ultrafast approximation for
phylogenetic bootstrap. Mol. Biol. Evol. 30, 1188–1195 (2013).
cf. multiciliatum have an elongate shape (Fig. 1a, Extended Data Fig. 1a) and the 51. Wang, H. C., Minh, B. Q., Susko, E. & Roger, A. J. Modeling site heterogeneity with
‘main row’ of flagella is restricted to the anterior portion. These features iden- posterior mean site frequency profiles accelerates accurate phylogenomic
tify this organism with Spironema rather than Hemimastix (broad and flattened), estimation. Syst. Biol. 67, 216–235 (2018).
52. Lartillot, N., Lepage, T. & Blanquart, S. PhyloBayes 3: a Bayesian software 56. Eddy, S. R. Accelerated profile HMM searches. PLOS Comput. Biol. 7, e1002195
package for phylogenetic reconstruction and molecular dating. Bioinformatics (2011).
25, 2286–2288 (2009). 57. Gouy, M., Guindon, S. & Gascuel, O. SeaView version 4: A multiplatform
53. Lartillot, N. & Philippe, H. A Bayesian mixture model for across-site graphical user interface for sequence alignment and phylogenetic tree building.
heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21, Mol. Biol. Evol. 27, 221–224 (2010).
1095–1109 (2004). 58. Foissner, W. & Foissner, I. in An Illustrated Guide to the Protozoa 2nd edn (eds
54. Susko, E. & Roger, A. J. On reduced amino acid alphabets for phylogenetic Lee, J. J. et al.) 1185–1186 (Society of Protozoologists and Allen Press,
inference. Mol. Biol. Evol. 24, 2139–2150 (2007). Lawrence, 2002).
55. Brown, J. W., Walker, J. F. & Smith, S. A. Phyx: phylogenetic tools for unix. 59. Zolffel, M. & Skibbe, O. Rediscovery of the multiflagellated protist Paramastix
Bioinformatics 33, 1886–1888 (2017). conifera Skuja 1948 (Protista incertae sedis). Nova Hedwigia 65, 443–452 (1997).
Extended Data Fig. 1 | Light micrographs of studied hemimastigotes. of different cells showing the capitulum (e), mid-body region with
a–m, Spironema cf. multiciliatum (a) and Hemimastix kukwesjijk (b–m) rotationally symmetrical plate overlap (f) and the posterior (g) with
differential interference contrast micrographs of live cells. a, Two views of radial arrangement of the posterior-most flagella. h–j, Pseudoseries that
a Spironema cf. multiciliatum cell, with inset that details the posterior end. illustrates the feeding process, showing the progression of prey-ingestion
Note the nucleus (marked by ‘n’), the detail of one of the posterior flagella stages. Note the widening capitulum and beginning of formation of the
(marked by an arrow, in the inset) and small contractile vacuole (cv, in phagocytic vacuole. k, Same cell as in j, showing the anterior flagella
inset), as well as posterior tail (line in inset). b, c, Optical sections through curving forward to surround prey (seen especially in early feeding).
one H. kukwesjijk cell, detailing the notches from which flagella emerge l, m, Dividing cells, showing the diagonal symmetry of short new rows
(arrowheads), a section through the capitulum (marked with a ‘c’) and a (nr) and longer old rows (or) of flagella, as well as the daughter nuclei (n).
conspicuous contractile vacuole in the cell posterior (shown in b). Scale bar, 10 μm.
d, Surface view of one of the two thecal plates. e–g, Optical cross-sections
Extended Data Fig. 2 | Scanning electron microscopy images of by the fixation process) along margin of the capitulum (compare to
H. kukwesjijk. a, Feeding cell, general view (anterior to left; note the prey undischarged extrusomes in Fig. 1d). d, Dividing cells, with the left-most
item attached to capitulum). b, Close-up of anterior end showing ingestion cell clearly showing the old row of full-length flagella (or) and the new row
in progress at the capitulum. c, Discharged extrusomes (ex; triggered with short flagella (nr). Scale bars, 5 μm (a, d), 2 μm (b, c).
Extended Data Fig. 3 | SSU rDNA phylogeny of eukaryotes. Phylogeny included and marked with an asterisk. The numbers on branches show
inferred from 111 taxa and 1,252 sites under the GTR + Γ model in bootstrap percentages (1,000 replicates; values below 50% not shown).
RAxML. Hemimastigophora—including H. kukwesjijk and Spironema Branches in grey are half their original length. This tree was the reference
cf. multiciliatum from this study—are shown in red. Colours of other phylogeny for pplacer analyses shown in Fig. 2. Scale bar denotes 0.1
sequence names correspond to the same taxonomic groupings as in Fig. 3. expected substitutions per site.
The sequence of Spumella sp. strain BW2S, the prey for H. kukwesjijk, is
Extended Data Fig. 4 | Unrooted phylogeny of eukaryotes, 104 taxa 100% support. The Carpediemonas branch is shown reduced by 1/3 of
dataset. Phylogeny inferred from 351 genes, using maximum likelihood the original length for display purposes. Scale bar denotes 0.1 expected
under the LG + C60 + F + Γ model. The numbers on branches show substitutions per site.
ultrafast bootstrap approximation percentages, with filled circles denoting
Extended Data Fig. 5 | Unrooted phylogeny using 58-nLB dataset. replicates), then ultrafast bootstrap approximation percentages (1,000
Phylogeny inferred from 351 genes, using maximum likelihood under replicates). Filled circles denote 100% support with both methods. Scale
the LG + C60 + F + Γ model. The numbers on branches show PMSF bar denotes 0.1 expected substitutions per site.
bootstrap percentages (bootstrap support PMSF; 200 true bootstrap
Extended Data Fig. 6 | Unrooted phylogeny using 58-nDP dataset. replicates), then ultrafast bootstrap approximation percentages (1,000
Phylogeny inferred from 351 genes, using maximum likelihood under replicates). Filled circles denote 100% support with both methods. The
the LG + C60 + F + Γ model. The numbers on branches show PMSF branches leading to Bodo, Diplonema and Tetrahymena are shown reduced
bootstrap percentages (bootstrap support PMSF; 100 true bootstrap by 1/3. Scale bar denotes 0.1 expected substitutions per site.
Extended Data Fig. 7 | Unrooted phylogeny using 61-SR4 dataset of 61 replicates). Filled circles represent 100% support. The branches leading to
taxa. Phylogeny inferred from 351 genes, with amino acids recoded as Bodo, Diplonema and Tetrahymena are shown reduced by 1/3. Scale bar
four states, using maximum likelihood under the GTR + R6 + F model. denotes 0.1 expected substitutions per site.
The numbers on branches show bootstrap percentages (500 true bootstrap
Extended Data Fig. 8 | Summary of 61-SFSR analysis. Chart follows percentages (1,000 replicates) inferred using maximum likelihood under
the support for several important bipartitions with the sequential the LG + C60 + F + Γ-derived PSMF model using a guide tree pruned
removal of the fastest-evolving sites from the 61-taxon, 351-gene of hemimastigotes (PMSF-nHEMI, see Methods); these values are not
dataset. The support values are ultra-fast bootstrap approximation directly comparable to those from the other illustrated analyses.
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.
Statistical parameters
When statistical analyses are reported, confirm that the following items are present in the relevant location (e.g. figure legend, table legend, main
text, or Methods section).
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
An indication of whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Data analysis (A) SSU rRNA/DNA data: (Profile) Alignment: MUSCLE in Seaview v 4.6, Site selection: Gblocks (with manual correction) in Seaview 4.6,
Maximum likelihood phylogenetic analysis: RAxML v 8.2.6. Alignment of short environmental sequence reads to full alignment: PaPaRa v
2.5, Phylogenetic placement of short reads: PPlacer v 1.1.
(B) Phylogenomic Datasets: Alignment: MAFFT v 7.0, Site selection: BMGE v 1.0; Maximum likelihood phylogenetic analyses: IQ-TREE v
1.4.4, Bayesian Inference of Phylogeny: PhyloBayes v 4.1. Identification of outlier long-branch taxa for a derivative phylogenetic analysis
done with a short script available on request (to L. Eme). Taxon removal from datasets as part of derivative fast-site removal phylogenetic
analysis with Phyx v.0.1
April 2018
(C) Gene identity analysis: HMM building and searches: Hmmer-3.1b2. Alignment: MUSCLE in Seaview v. 4.6 (profile alignment) or MAFFT
v 7.0 (full alignment). Where Phylogenies performed: Site selection: BMGE v 1.1 (full alignment), phylogenetic analysis: IQ-TREE v 1.5.5.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers
upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
1
Data
Field-specific reporting
Please select the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/authors/policies/ReportingSummary-flat.pdf
Data exclusions A small number of individual genes were excluded from inclusion in the phylogenomic datasets, if they were determined to be likely
contaminations, paralogs or similar spurious data. This determination was made from single-gene phylogenetic trees for each of the genes
included. This is reported in the methods and is a standard and necessary procedure for eukaryote-wide phylogenomic analyses.
Replication NOT RELEVANT TO STUDY - The analyses were all analysed from fixed sets of sequence data and, for all intents and purposes, are intrinsically
reproducible. Convergence behaviour of the MCMC Bayesian phylogenetic analysis is reported in the methods.
Randomization NOT RELEVANT TO STUDY - There were no experimental groups to assign individuals to at any point in the study
Blinding NOT RELEVANT TO STUDY - There were no experimental groups to assign individuals to at any point in the study
2
Obtaining unique materials during this study (and are described in the methods). They are available from the authors on request (as a matter of basic
scientific ethics). Note that these are not 'eukaryote cell lines' in the normal sense of the term, thus we have marked "n/a" for
Field-collected samples A single small field-collection of soil was the source of the cells and cultures examined in this study. The sample was kept
hydrated at room temperature and ambient light for 4 weeks prior to, and during, the isolations and observations of unicellular
protists reported in the study
April 2018