0% found this document useful (0 votes)
106 views19 pages

Lax 2018 Hemimastigophora Supra Kingdom

The document discusses the discovery of Hemimastigophora, a new lineage of eukaryotes that is morphologically distinct and has been placed outside established eukaryote supergroups. It describes the first culture of a hemimastigote species, Hemimastix kukwesjijk, and provides details on its morphology and ecological significance. The findings highlight the evolutionary importance of this group for understanding the origins of eukaryotic life and the last common ancestor of eukaryotes.

Uploaded by

matheuslique
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views19 pages

Lax 2018 Hemimastigophora Supra Kingdom

The document discusses the discovery of Hemimastigophora, a new lineage of eukaryotes that is morphologically distinct and has been placed outside established eukaryote supergroups. It describes the first culture of a hemimastigote species, Hemimastix kukwesjijk, and provides details on its morphology and ecological significance. The findings highlight the evolutionary importance of this group for understanding the origins of eukaryotic life and the last common ancestor of eukaryotes.

Uploaded by

matheuslique
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Letter https://doi.org/10.

1038/s41586-018-0708-8

Hemimastigophora is a novel supra-kingdom-level


lineage of eukaryotes
Gordon Lax1,4, Yana Eglit1,4, Laura Eme2,3,4, Erin M. Bertrand1, Andrew J. Roger2 & Alastair G. B. Simpson1*

Almost all eukaryote life forms have now been placed within Gene sequence. The partial small subunit ribosomal RNA (SSU
one of five to eight supra-kingdom-level groups using molecular rRNA) gene sequence of strain BW2H has been deposited in GenBank,
phylogenetics1–4. The ‘phylum’ Hemimastigophora is probably accession code MF682191.
the most distinctive morphologically defined lineage that still Comments. Cells are larger and have several more flagella than
awaits such a phylogenetic assignment. First observed in the Hemimastix amphikineta, the only previously described species
nineteenth century, hemimastigotes are free-living predatory (14-μm by 7-μm cell body, 12 flagella per row6).
protists with two rows of flagella and a unique cell architecture5–7; Cells of H. kukwesjijk are oval in profile with a blunt anterior
to our knowledge, no molecular sequence data or cultures are projection (the capitulum) and two rows of flagella along their
currently available for this group. Here we report phylogenomic whole length (Fig. 1b, Extended Data Fig. 1). In cultivation as
analyses based on high-coverage, cultivation-independent strain BW2H, live cells were 16.5–20.5-μm long by 7–12.5-μm wide
transcriptomics that place Hemimastigophora outside of all (18.3 ± 1 μm × 9.9 ± 1.2 μm; n = 61), with a sub-central, rounded
established eukaryote supergroups. They instead comprise an nucleus and posterior contractile vacuole (Fig. 1c). Each row of 17–19
independent supra-kingdom-level lineage that most likely forms a flagella (mean 18.4; n = 25) lay in a channel between the two thick
sister clade to the ‘Diaphoretickes’ half of eukaryote diversity (that thecal plates. The anteriormost 9 or 10 flagella were closely spaced,
is, the ‘stramenopiles, alveolates and Rhizaria’ supergroup (Sar), and the rest emerged from separate notches in the underlying plate
Archaeplastida and Cryptista, as well as other major groups). The (Fig. 1b, e). The capitulum was bordered by the overlapping anterior
previous ranking of Hemimastigophora as a phylum understates the
evolutionary distinctiveness of this group, which has considerable
a b c d
importance for investigations into the deep-level evolutionary cap.
history of eukaryotic life—ranging from understanding the origins
of fundamental cell systems to placing the root of the tree. We have
also established the first culture of a hemimastigote (Hemimastix
kukwesjijk sp. nov.), which will facilitate future genomic and cell-
biological investigations into eukaryote evolution and the last
eukaryotic common ancestor.
We identified two previously undescribed species of the rarely
observed protist group Hemimastigophora (one Spironema and one
Hemimastix) in enrichments from soil. Here we formally describe the
newly identified Hemimastix species. e f

Hemimastix Foissner, Blatterer & Foissner 1988


Hemimastix kukwesjijk Eglit and Simpson, sp. nov.

Etymology. Kukwesjijk (approximate pronunciation, ‘ku–ga–wes–jij–


k’). ‘Kukwes-’ (Mi’kmaq), a rapacious, hairy ogre from the traditions
of the Mi’kmaq First Nation of Nova Scotia; ‘-jijk’, a diminutive
plural suffix. ‘Little ogres’ reflects the predatory and hairy nature of
this microorganism, and the use of Mi’kmaq language and tradition
acknowledges the region in which the species was isolated.
Type material. The name-bearing hapantotype consists of trophic cells
and dividing cells of strain BW2H that are osmium-fixed, sputter-coated Fig. 1 | Micrographs of studied hemimastigotes. a, Spironema cf.
and mounted for scanning electron microscopy. This material is deposited multiciliatum, cell 1 (of 4) isolated for transcriptomics. b–f, H. kukwesjijk,
with the American Museum of Natural History (New York) with accession cell 1 (of 2) isolated for transcriptomics (b); note the presence of the
code AMNH_IZC 00267132. This material also contains prey Spumella capitulum (cap.). c, d, Cells from culture (strain BW2H); note the nucleus
sp. (Stramenopiles) and uncharacterized prokaryotes, both of which are and the contractile vacuole at the posterior (c), and feeding on prey
with the capitulum (d). e, General view of cell (strain BW2H), anterior
explicitly excluded from the hapantotype.
with the capitulum to right. f, Detail of the capitulum, showing caps
Description. Hemimastix species, 16.5–20.5-μm long with 17–19 of undischarged extrusomes (arrowheads) and close-spaced flagella in
flagella per row. anterior part of flagellar rows. a–d, Differential interference contrast light
Type locality. Bluff Wilderness Trail, Nova Scotia, Canada microscopy. e, f, Scanning electron microscopy. Scale bars, 10 μm (a), 5 μm
(44.6610154° N, 63.7674669° W); soil from mixed-species woodland. (b–e; scale bar in b applies to images b–d), 1 μm (f).

1
Department of Biology, Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada. 2Centre for Comparative Genomics and
Evolutionary Bioinformatics, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada. 3Present address: Department of Cell and Molecular Biology,
Science for Life Laboratory, Uppsala University, Uppsala, Sweden. 4These authors contributed equally: Gordon Lax, Yana Eglit, Laura Eme. *e-mail: [email protected]

N A t U r e | www.nature.com/nature
© 2018 Springer Nature Limited. All rights reserved.
RESEARCH Letter

Sample Habitat Environment Sample Ratio Habitat Environment


AR1.V9 Stream water Polar SS1.Mt 1 Soil Polar
BS1.V9 Anoxic sediment Temperate SS2.Mt 1 Soil Polar
AR2.V9 Stream water Polar OS1.V4 1 Water column Subpolar
AR3.V9 Lake water Polar BM1.V4 1 Sediment Temperate
BF1.V9 Water column Temperate OB1.V4 1 Sediment and water column Temperate
BM2.V4 1 Water column Temperate
LS1.V4 1 Soil Tropical
SA1.V4 1 Water column Polar
LS2.V4 0.98 Soil Tropical
LS3.V4 0.98 Soil Tropical
? BM3.V4 0.97 Sediment Temperate
BM4.V4 0.96 Sediment Temperate
LS4.V4 0.96 Soil Tropical
OB2.V4 0.91 Sediment and water column Temperate
OB3.V4 0.80 Sediment and water column Temperate
OB4.V4 0.75 Sediment and water column Temperate

Hemimastix
Freshwater
Clone AY689797 | Stream sediment Marine
Temperate
Brackish
Spironema Terrestrial

Sample Ratio Habitat Environment Sample Ratio Habitat Environment


LS5.V4 0.94 Soil Tropical BS3.V9 0.66 Anoxic water Temperate
BM5.V4 0.90 Sediment Temperate AR7.V9 0.65 Lake water Polar
BM6.V4 0.85 Sediment Temperate BS4.V9 0.62 Anoxic water Temperate
LS6.V4 0.81 Soil Tropical AR8.V9 0.58 Lake water Polar
AR4.V9 0.80 Lake water Polar AR9.V9 0.56 Lake water Polar
AR5.V9 0.71 Stream water Polar AR10.V9 0.55 Lake water Polar
AR6.V9 0.70 Stream water Polar AP1.V9 0.52 Sediment Polar
BS2.V9 0.67 Anoxic water Temperate AR11.V9 0.51 Stream water Polar

Fig. 2 | Environmental SSU rRNA/rDNA reads assigned to ‘?’) were of uncertain placement within Hemimastigophora—that is,
Hemimastigophora. The pplacer likelihood-to-weight ratio, habitat the likelihood-to-weight ratio for any single branch within the clade
and environmental zone are reported for each read (denoted by sample was < 0.5, but the sum of all likelihood-to-weight ratios = 1. See Extended
code in the left columns). Reads with a likelihood-to-weight ratio > 0.5 Data Fig. 3 for full reference tree. Supplementary Table 1 gives additional
are assigned to a branch. Five assigned sequences (denoted by a circled information on individual reads, including sample codes.

ends of the flagellar rows, and the adjacent plate margins housed ext- computationally intensive analyses). The transcriptomes proved to
rusomes (undischarged, Fig. 1f; discharged, Extended Data Fig. 2c). be high-coverage (Spironema, 290 of 351 = 82.6% of genes and 77.6%
Cells fed on a small stramenopile (Spumella sp.) after attachment at of sites represented; Hemimastix, 280 of 351 = 79.7% of genes, 72.1%
the capitulum, and enclosure by the anterior flagella (Fig. 1d, Extended of sites). Maximum likelihood analyses of both the 104-taxon and
Data Figs. 1h–k, 2a, b). 61-taxon datasets were consistent with other recent phylogenomic
The Spironema species awaits formal description; the cells we iso- studies3,9,10 in dividing previously known eukaryotes into three clans:
lated—which we discuss here as Spironema cf. multiciliatum (see Diaphoretickes, Discoba and an ‘Amorphea+’ assemblage (Fig. 3,
‘Identification of Spironema cf. multiciliatum’ in Methods)—were Extended Data Fig. 4). The major subgroups of Diaphoretickes were
spindle-shaped with a thin ‘tail’. These cells were 23–31-μm long by Sar plus Telonema, Haptophyta plus Centrohelida, and Cryptista
4–7.5-μm wide (mean ± s.d., 27.4 ± 3.5 μm × 5.4 ± 1.6 μm; n = 7), with plus Archaeplastida and Picozoa. The ‘Amorphea+’ group con-
an oval nucleus and two rows of six or more flagella clustered in the tained Obazoa and Amoebozoa, as well as collodictyonids, rigifilids,
anterior quarter, plus two or three flagella per row more posteriorly Mantamonas, Ancyromonadida, Malawimonadida and Metamonada.
(Fig. 1a, Extended Data Fig. 1a). The position of metamonads was unstable, which mirrors conflicts seen
We determined SSU rRNA sequences from both hemimastigotes, in other recent analyses9,11.
and used these to analyse published environmental sequence data- Spironema and Hemimastix formed a maximally supported
sets to determine (1) the distribution of the group across habitats and Hemimastigophora clade that was phylogenetically isolated. The 104-
(2) whether these sequences matched a known environmental clade. taxa analysis placed Hemimastigophora amongst the deepest branches
Unlike some other recently characterized lineages (for example, ref. 8), within Diaphoretickes, as the sister of a clade of Sar, Telonema, hap-
hemimastigotes do not appear to belong to a previously identified tophytes and centrohelids—though with equivocal support (ultra-
environmental clade. One unclassified long-read clone from freshwater fast bootstrap approximation = 83%; Fig. 4, Extended Data Fig. 4).
sediment (AY689797) was phylogenetically related to Spironema (Fig. 2, In the 61-taxa analysis, Hemimastigophora again grouped with
Extended Data Fig. 3). An additional 37 short reads were detected Diaphoretickes (bootstrap support = 100% (posterior mean site fre-
among V4 or V9 amplicon datasets, or soil metatranscriptomes quency method); ultra-fast bootstrap approximation = 93%; Bayesian
(Fig. 2, Supplementary Table 1). Many of the V4 and V9 amplicons posterior probability = 1), but actually branched sister to all of the other
derived from soil or freshwater, consistent with most light microscopy Diaphoretickes, which formed a clade (bootstrap support = 88%; ultra-
accounts7. However, nearly half of these amplicons came from marine fast bootstrap approximation = 60%; Bayesian posterior probability = 1;
sediment or water-column samples (Fig. 2), and one Hemimastix-like Fig. 3).
V4 amplicon was among the 25 most-abundant operational taxonomic To further explore the position of Hemimastigophora, we analysed
units in a fjord sediment dataset (Supplementary Table 1). several derivatives of the 61-taxon dataset that excluded potential
To place hemimastigotes in the tree of eukaryotes, we generated sources of phylogenetic inaccuracy. Analyses that (i) excluded the
transcriptomes from isolated single cells of both Spironema and three taxa identified as outlier long-branches (dataset referred to
Hemimastix, and assembled 351-gene datasets with a broad sampling as ‘58-nLB’), (ii) excluded the three data-poorest taxa (site cover-
of eukaryote taxa (initially 104 taxa; this was reduced to 61 taxa for age < 30%; dataset referred to as ‘58-nDP’) or (iii) recoded the amino

N A t U r e | www.nature.com/nature
© 2018 Springer Nature Limited. All rights reserved.
Letter RESEARCH

Toxoplasma
Vitrella
Karenia
Protocruzia
99 Tetrahymena
94 Ectocarpus
1 Phytophthora Sar
82 Schizochytrium
82 Stramenopile MMETSP1104
0.91 Bigelowiella
Paulinella
96 Gromia
83 Telonema
1 Pleurochrysis
Prymnesium Haptophyta

Diaphoretickes
87/90/0.92 Pavlovales CCMP2436
88
Acanthocystis
60 Choanocystis Centrohelida
1 Arabidopsis
Physcomitrella
Micromonas
Volvox
100 Cyanoptyche Archaeplastida
93 Gloeochaete
1 97/92/1 Chondrus
Porphyridium
97/99/1 Galdieria
Picozoa PB58411a
100/99/1 Guillardia
Goniomonas
Roombia Cryptista
Palpitomonas
Spironema
Hemimastix Hemimastigophora
Bodo
Diplonema
Eutreptiella
Naegleria
Pharyngomonas Discoba
93/89/1
Tsukubamonas
Reclinomonas
Andalucia
Monosiga
Homo
Amoebidium
Capsaspora Obazoa
Fonticula
Spizellomyces
Thecamonas

Amorphea+
Mastigamoeba
Physarum
Acanthamoeba Amoebozoa
99 Vannella
61 Rigifila
1 Diphylleia CRuMs
Mantamonas
Malawimonas
Gefionella Malawimonadida
100
Fabomonas
97 Nutomonas Ancyromonadida
0.99 81/53/NA
Trimastix Metamonada
0.1

Fig. 3 | Phylogenetic placement of Hemimastigophora within posterior probabilities (under the ‘CAT + GTR’ model). Filled circles
eukaryotes. Unrooted phylogeny inferred from 351 genes and 61 taxa, denote maximum support with all methods (that is, 100, 100 and 1,
using maximum likelihood under the ‘LG + C60 + F + Γ’ model. respectively). The three longest branches (leading to Bodo, Diplonema and
The numbers on branches show—in order from top to bottom or Tetrahymena) are shown reduced by 1/3. CruMs: collodictyonids, rigifilids
from left to right—posterior mean site frequency (PMSF) bootstrap and Mantamonas. NA, not available. Scale bar denotes 0.1 expected
percentages (bootstrap support; 200 true bootstrap replicates), ultrafast substitutions per site.
bootstrap approximation percentages (1,000 replicates) and Bayesian

acid data into four categories (dataset referred to as ‘61-SR4’) all sup- that they represent a novel, supra-kingdom-level lineage. This identifies
ported the same topology as the original 61-taxa analysis—that is, hemimastigotes as a crucial group to include in descriptions of the
Hemimastigophora outside of and sister to Diaphoretickes (Fig. 4, tree of eukaryote life, and in most studies of the evolution of eukar-
Extended Data Figs. 5–7). However, removing fast-evolving sites yotic cells. This is especially important when inferring the history of
did not systematically favour the tree inferred in the 61-taxa analy- eukaryotic innovations, or the nature of the last eukaryotic common
sis over a topology in which Hemimastigophora is sister to a Sar + ancestor, from the distributions across supergroups of particular genes,
Telonema + Haptophyta + Centrohelida clade (as in the 104-taxa genome characteristics or cellular features15–19. Hemimastigotes may
analysis; Extended Data Fig. 8). Thus, although most analyses place be equally important in the immensely challenging task of placing
Hemimastigophora as branching outside other Diaphoretickes, the the root of the eukaryote tree. The root is usually inferred20–23 to lie
alternative position—in which hemimastigotes fall one node inside somewhere between the largest eukaryote clans—approximately
Diaphoretickes—remains credible (Fig. 4). in one of the positions marked a, b or c in Fig. 4—with position a
All previous proposals for the phylogenetic or systematic place- (between Amorphea, and Diaphoretickes plus Discoba) currently
ment of Hemimastigophora were based on morphology alone. The being the most favoured22,23. Hemimastigophora appears to lie close
sub-membranous thecal plates between the two rows of flagella sug- to all of these positions on the unrooted tree (see Fig. 4), and could be
gested an affinity with euglenids, which have a pellicle6,7. Subsequently, our only known representative of one of the most ancient divisions
affinities were proposed with completely different taxa that have pelli- amongst extant eukaryotes. Accordingly, we searched the single-cell
cular or thecal structures—namely alveolates12, or apusomonads and transcriptomes for genes that could have arisen during the divergences
ancyromonads13. A placement within Rhizaria was also suggested on between supergroups (Fig. 4, Supplementary Table 3). We found sev-
the basis of flagellum and extrusome substructure14. None of these eral genes in hemimastigotes that are not known from Diaphoretickes,
proposals is supported by our phylogenies, because Hemimastigophora including those for myosin II—previously known from Amorphea,
is always distantly related to euglenids (Euglenozoa, in Discoba), apu- and one subgroup of Discoba18,24—and Golgi protein GCP16 (also
somonads and ancyromonads (both in Amorphea+) and Sar (which known as golgin A7) (previously specific to Amorphea)19. The pres-
contains Alveolata and Rhizaria). ence of such genes in hemimastigotes either pushes back their likely
Instead, the extremely deep phylogenetic position of origins to before the last eukaryotic common ancestor (or supports
Hemimastigophora—most likely at the base of Diaphoretickes—implies this inference) or—more controversially—could be due to the root of

N A t U r e | www.nature.com/nature
© 2018 Springer Nature Limited. All rights reserved.
RESEARCH Letter

P1 XIX
yo X
yo II

G in X
yo I
M in I

M in I

M in I

ZF 6

S2 1
AP 1

H σ
s

-
PL

5-
PS
yo

P
C
M
104-taxa 83 UFB
Alveolata
BS/UFB Sar + Telonemia Stramenopila
61-taxa 88/60
58-nLB 96/65 Rhizaria

Diaphoretickes
58-nDP 91/63
61-SR4 95 Haptophyta + Centrohelida

Hemimastigophora
(alternate topology)
Archaeplastida + Picozoa

c? Cryptista

Hemimastigophora

Discoba
b
Obazoa
Amorphea+ Amoebozoa
a Metamonada

Fig. 4 | Summary of phylogenomic analyses and distribution of analyses equivocated between these alternatives (see Extended Data
select genes across eukaryotes. Left, inferred phylogenetic positions Fig. 8). Labels a, b and c show the possible positions of the eukaryote root;
of Hemimastigophora. Box with solid outline details the support for the likely placement of Hemimastigophora results in several variants of
Hemimastigophora as a deep branch relative to the ‘Diaphoretickes’ position c. Right, known distributions of selected proteins encoded by
supergroup, in various analyses. BS, PMSF bootstrap support (except genes with proposed deep origins among living eukaryotes that were
61-SR4 for which the ‘GTR + R6 + F’ model was used). UFB, ultrafast detected in hemimastigote transcriptomes. Boxes filled in their top half
bootstrap approximation support. Dashed box shows support for the denote genes detected in Spironema; boxes filled in their bottom half
alternative topology (Hemimastigophora as a deep branch within denote genes detected in Hemimastix; completely filled boxes represent
Diaphoretickes) in the 104-taxa analysis. Stepwise fast-site removal genes detected in both organisms; see Supplementary Table 3 for details.

eukaryotes being further from the base of Amorphea than generally Received: 26 October 2017; Accepted: 21 September 2018;
supposed23—that is, Amorphea and Hemimastigophora being on the Published online xx xx xxxx.
same side of the root (shown by the top variant of position c in Fig. 4).
However, another hemimastigote myosin-family gene was previously 1. Burki, F. The eukaryotic tree of life from a global phylogenomic perspective.
unknown outside the Sar clade (Fig. 4): irrespective of the final position Cold Spring Harb. Perspect. Biol. 6, a016147 (2014).
of the root, this survey demonstrates that the antiquity of gene origins 2. Worden, A. Z. et al. Rethinking the marine carbon cycle: factoring in the
multifarious lifestyles of microbes. Science 347, 1257594 (2015).
tends to be underestimated until all major lineages are considered. This 3. Burki, F. et al. Untangling the early diversification of eukaryotes: a phylogenomic
bias can result in the underestimation of the gene content of ancient study of the evolutionary origins of Centrohelida, Haptophyta and Cryptista.
eukaryotes, and thus overestimations of the simplicity of their cell biol- Proc. R. Soc. Lond. B 283, 20152802 (2016).
4. Simpson, A. G. B. & Eglit, Y. in Encyclopedia of Evolutionary Biology Vol. 3 (ed.
ogy. Examining hemimastigote genomes—and ultimately their cell Kliman, R. M.) 344–360 (Elsevier, Amsterdam, 2016).
biology—will be valuable for better understanding eukaryote evolution 5. Klebs, G. Flagellatenstudien (Akademische Verlags-Gesellschaft, Leipzig, 1893).
at the deepest levels. 6. Foissner, W., Blatterer, H. & Foissner, I. The Hemimastigophora (Hemimastix
amphikineta nov. gen., nov. spec.), a new protistan phylum from Gondwanian
This study has used single-cell transcriptomics to unveil a soils. Eur. J. Protistol. 23, 361–383 (1988).
deep-branching eukaryote lineage. Single-cell transcriptomics and 7. Foissner, I. & Foissner, W. Revision of the family Spironemidae Doflein (Protista,
genomics25–27 bypass the ‘culture bottleneck’ and thus provide a rapid Hemimastigophora), with description of two new species, Spironema terricola
n. sp. and Stereonema geiseri n. g., n. sp. J. Eukaryot. Microbiol. 40, 422–438
path to deeper taxon sampling, even when species from a group of (1993).
interest are eventually cultivated. This is particularly valuable for phy- 8. Yubuki, N. et al. Morphological identities of two different marine stramenopile
logenomics, in which inaccuracy owing to poor taxon sampling is a environmental sequence clades: Bicosoeca kenaiensis (Hilliard, 1971) and
perpetual concern28. For this application, single-cell transcriptomics Cantina marsupialis (Larsen and Patterson, 1990) gen. nov., comb. nov.
J. Eukaryot. Microbiol. 62, 532–542 (2015).
outperforms single-cell genomics because of better coverage of house- 9. Brown, M. W. et al. Phylogenomics demonstrates that breviate flagellates are
keeping genes (see, for example, refs 26,27). Information on multiple related to opisthokonts and apusomonads. Proc. R. Soc. Lond. B 280, 20131755
related species is also valuable for ensuring data fidelity (detecting con- (2013).
10. Zhao, S. et al. Collodictyon—an ancient lineage in the tree of eukaryotes. Mol.
taminants, gene transfers and so on; see Methods). Single-cell techniques Biol. Evol. 29, 1557–1568 (2012).
are especially promising for the heterotrophic protozoa that probably 11. Cavalier-Smith, T. et al. Multigene eukaryote phylogeny reveals the likely
represent most ‘undiscovered’ major lineages, and for which establishing protozoan ancestors of opisthokonts (animals, fungi, choanozoans) and
Amoebozoa. Mol. Phylogenet. Evol. 81, 71–85 (2014).
cultures with suitable prey or hosts can be challenging25,27,29,30. 12. Cavalier-Smith, T. A revised six-kingdom system of life. Biol. Rev. Camb. Philos.
In this molecular phylogenetic investigation of Hemimastigophora, Soc. 73, 203–266 (1998).
we show that they are a previously unrecognized supergroup of eukar- 13. Cavalier-Smith, T. in The Flagellates, The Systematics Association Special Volume
Series 59 (eds Leadbeater, B. S. C. & Green, J. C.) 361–390 (Taylor & Francis,
yotes. Their phylogenetic distinctiveness is comparable to the whole London, 2000).
animal plus fungi clade (Opisthokonta) or the assemblage containing 14. Cavalier-Smith, T., Lewis, R., Chao, E. E., Oates, B. & Bass, D. Morphology and
all land plants and primary algae (Archaeplastida). We expect the dis- phylogeny of Sainouron acronematica sp. n. and the ultrastructural unity of
Cercozoa. Protist 159, 591–620 (2008).
covery or recognition of other important lineages will greatly accelerate 15. Speijer, D., Lukeš, J. & Eliáš, M. Sex is a ubiquitous, ancient, and inherent
owing to similar applications of single-cell methods. attribute of eukaryotic life. Proc. Natl Acad. Sci. USA 112, 8827–8834 (2015).
16. de Mendoza, A. et al. Transcription factor evolution in eukaryotes and the
Online content assembly of the regulatory toolkit in multicellular lineages. Proc. Natl Acad. Sci.
USA 110, E4858–E4866 (2013).
Any methods, additional references, Nature Research reporting summaries, source 17. Fukasawa, Y., Oda, T., Tomii, K. & Imai, K. Origin and evolutionary alteration of the
data, statements of data availability and associated accession codes are available at mitochondrial import system in eukaryotic lineages. Mol. Biol. Evol. 34,
https://doi.org/10.1038/s41586-018-0708-8. 1574–1586 (2017).

N A t U r e | www.nature.com/nature
© 2018 Springer Nature Limited. All rights reserved.
Letter RESEARCH

18. Sebé-Pedrós, A., Grau-Bové, X., Richards, T. A. & Ruiz-Trillo, I. Evolution and University) for Illumina sequencing, S. Geisen (Wageningen University) for
classification of myosins, a paneukaryotic whole-genome approach. Genome providing parsed metatranscriptomic data, F. Mahé (CIRAD, Montpellier) for
Biol. Evol. 6, 290–305 (2014). access to and parsing much of the V4 data, M. Brown (Mississippi State)
19. Barlow, L. D., Nývltová, E., Aguilar, M., Tachezy, J. & Dacks, J. B. A sophisticated, for the seed phylogenomic dataset, A. Sebé-Pedrós (Weizmann Institute of
differentiated Golgi in the ancestor of eukaryotes. BMC Biol. 16, 27 (2018). Science) for the seed myosin alignments, M. Kolisko (Institute of Parasitology,
20. He, D. et al. An alternative root for the eukaryote tree of life. Curr. Biol. 24, Czech Academy of Sciences) for data handling scripts, B. Q. Minh (University of
465–470 (2014). Vienna) for substantial help with phylogenomic analyses and troubleshooting
21. Katz, L. A., Grant, J. R., Parfrey, L. W. & Burleigh, J. G. Turning the crown upside in IQ-TREE, and R. Lewis (Nova Scotia Museum) and B. Francis for advice on
down: gene tree parsimony roots the eukaryotic tree of life. Syst. Biol. 61, Mi’kmaq tradition and language. This work was supported by CIFAR, NSERC
653–660 (2012). grant 298366-2014 to A.G.B.S. and NSERC grant 2016-016792 to A.J.R.
22. Derelle, R. & Lang, B. F. Rooting the eukaryotic tree with mitochondrial and
bacterial proteins. Mol. Biol. Evol. 29, 1277–1289 (2012). Reviewer information Nature thanks I. Ruiz-Trillo and the other anonymous
23. Derelle, R. et al. Bacterial proteins pinpoint a single eukaryotic root. Proc. Natl reviewer(s) for their contribution to the peer review of this work.
Acad. Sci. USA 112, E693–E699 (2015).
24. Richards, T. A. & Cavalier-Smith, T. Myosin domain evolution and the primary Author contributions Y.E. isolated the organisms and cultivated H. kukwesjijk.
divergence of eukaryotes. Nature 436, 1113–1118 (2005). Y.E. and G.L. undertook the microscopy. G.L. performed the single-cell
25. Kolisko, M., Boscaro, V., Burki, F., Lynn, D. H. & Keeling, P. J. Single-cell transcriptomics. Y.E., G.L. and E.M.B. analysed the rDNA and environmental
transcriptomics for microbial eukaryotes. Curr. Biol. 24, R1081–R1082 (2014). sequence data. G.L., L.E., Y.E. and A.G.B.S. assembled the phylogenomic
26. Yoon, H. S. et al. Single-cell genomics reveals organismal interactions in datasets. G.L., L.E. and A.J.R. performed phylogenomic analyses. L.E. and
uncultivated marine protists. Science 332, 714–717 (2011). Y.E. performed the gene presence analyses. G.L., Y.E. and A.G.B.S. wrote the
27. Gawryluk, R. M. R. et al. Morphological identification and single-cell genomics of manuscript, with input from all co-authors.
marine diplonemids. Curr. Biol. 26, 3053–3059 (2016).
28. Keeling, P. J. et al. The Marine Microbial Eukaryote Transcriptome Sequencing Competing interests The authors declare no competing interests.
Project (MMETSP): illuminating the functional diversity of eukaryotic life in the
Additional information
oceans through transcriptome sequencing. PLoS Biol. 12, e1001889 (2014).
Extended data is available for this paper at https://doi.org/10.1038/s41586-
29. Caron, D. A. et al. Probing the evolution, ecology and physiology of marine
018-0708-8.
protists using transcriptomics. Nat. Rev. Microbiol. 15, 6–20 (2017).
Supplementary information is available for this paper at https://doi.org/
30. Krabberød, A. K. et al. Single cell transcriptomics, mega-phylogeny, and the
10.1038/s41586-018-0708-8.
genetic basis of morphological innovations in Rhizaria. Mol. Biol. Evol. 34,
Reprints and permissions information is available at http://www.nature.com/
1557–1573 (2017).
reprints.
Correspondence and requests for materials should be addressed to A.G.B.S.
Acknowledgements The authors thank P. Li and P. Scallion (Dalhousie Publisher’s note: Springer Nature remains neutral with regard to jurisdictional
University) for assistance with electron microscopy, M. Dlutek (Dalhousie claims in published maps and institutional affiliations.

N A t U r e | www.nature.com/nature
© 2018 Springer Nature Limited. All rights reserved.
RESEARCH Letter

Methods Hemimastigophora with a likelihood-to-weight ratio > 0.5, or if they had an


No statistical methods were used to predetermine sample size. accumulated likelihood-to-weight ratio > 0.9 across the multiple branches within
Cell isolation and transcriptomics. Soil from mixed-species woodland in the Hemimastigophora.
Bluff Wilderness Trail in Nova Scotia, Canada (44.6610154° N, 63.7674669° W; Phylogenomic dataset assembly. To perform phylogenomic analyses of eukar-
17 April 2016) was kept hydrated with distilled water in a Petri dish until hemimas- yotes that included hemimastigotes, we used the single-cell transcriptomes
tigotes were observed about four weeks later. Single Spironema and Hemimastix derived from Hemimastix (2 cells) and Spironema (4 cells), as described above.
cells were isolated with drawn-out micropipettes, photo-documented by differ- Raw reads from the Illumina sequencing were quality-trimmed, and the adapters
ential interference contrast light microscopy (using a Zeiss Axiovert 200M and clipped with Trimmomatic version 0.3244 (default parameters), then assembled
AxioCam ICc5 microscope and camera system; Carl Zeiss AG), and subjected to with Trinity45 version 2.0.2 (default parameters). Assemblies were cleaned of
single-cell transcriptomics using the Smart-seq2 protocol31 with modifications. sequencing cross-contamination using a custom script. Marker genes of interest
In brief, four (Spironema) or two (Hemimastix) cells were individually picked into were extracted using a previously reported pipeline46 and appended as translated
0.2% Triton X-100 lysis buffer, immediately frozen in liquid nitrogen, then thawed peptide sequences to a 396-taxon, 351-gene eukaryote dataset46. This dataset
and re-frozen three times. The remaining procedure followed the original protocol, was pruned to 107 taxa that broadly represented all major eukaryotic groups
with 20 (Spironema) or 18 (Hemimastix) PCR cycles. cDNA quantity and quality for which data were available, while excluding extremely ‘long-branching’ spe-
was assessed (i) by Qubit dsDNA HS assay (Thermo Fisher, Q32851) and (ii) by cies and—where possible—species with poor sampling of this gene set. The 351
PCR, and cloning of cDNA fragments into StrataClone SoloPack competent cells single-gene dataset was aligned individually using MAFFT-L-INS-i47 version 7.0,
(Agilent Technologies), and 12 clones each were Sanger-sequenced. After library and trimmed with BMGE48 version 1.0 (-m BLOSUM30 -h 0.5 -g 0.2). From the
preparation with Illumina Nextera XT, sequencing was carried out on an Illumina resulting files, single-gene trees were generated with IQ-TREE49 version 1.4.4
MiSeq with 2 × 250-bp dual reads, with the libraries multiplexed on the same run. under the LG + C20 + F + G model with a 1,000-replicate, ultra-fast bootstrap
Cultivation of H. kukwesjijk. To cultivate H. kukwesjijk strain BW2H, three cells approximation to estimate branch support50. These trees were manually checked
were picked and washed with a micropipette, then transferred to a prey—Spumella for sequences corresponding to probable paralogues, contaminants, or lateral- or
sp. (strain BW2S)—that was cultured by serial dilution from the same sample. endosymbiotic gene transfers, which were then removed from the datasets. The
Cultures were maintained in 15-ml tubes containing ~4 ml of 25%-strength ATCC tree estimation and manual checking was then repeated, and any additional suspect
medium 802, with one sterilized barley grain, angled for aeration and transferred sequences removed. Three taxa with limited remaining data (<10% of sites) were
weekly. Cells were examined by light microscopy as described above. then excluded, leaving 104 taxa for initial phylogenomic analysis.
Scanning electron microscopy of H. kukwesjijk. Cells from a ten-day-old culture Quality of hemimastigote transcriptomes. It was particularly important to assess
of strain BW2H were fixed for 30 min in OsO4 vapour alone (at room temperature) the quality of the data from Spironema and Hemimastix, both because they were
or OsO4 vapour simultaneously with 2.5% glutaraldehyde (on ice), and filtered onto the subject of the study and because they were derived using single-cell methods
2-μm isopore membrane filters (Millipore). These were washed in distilled water from crude enrichments. The transcriptome from Spironema included 290 of the
and dehydrated in a series of 50–100% ethanol mixtures, critical-point-dried in 351 genes in the phylogenomic dataset (82.6%) and 77.6% of the sites retained after
CO2 and sputter-coated by 10 nm of gold-palladium. Cells were imaged using a trimming. The transcriptome from Hemimastix included 280 out of 351 = 79.7%
Hitachi S-4700 SEM at 3 kV. of genes, and 72.1% of sites. In other words, both transcriptomes were reasona-
SSU rDNA analyses. A single cell of Spironema cf. multiciliatum was isolated bly data-rich from a phylogenomic perspective, and compare well to many tran-
and washed by micropipetting, and then photo-documented (see above). The scriptomes from cultivated non-model protists (Supplementary Table 2). In all,
genomic DNA of this cell was amplified using multiple displacement ampli- 247 of the 351 gene alignments (70.4%) included both taxa. The Spironema and
fication (Illustra GenomiPhi V3 DNA amplification kit, GE Healthcare). Total Hemimastix sequences formed a clade in 168 of the 247 (68%) single-gene trees
genomic DNA was extracted from H. kukwesjijk culture BW2H (also including inferred for these data, which is consistent with a specific relationship between
the prey Spumella sp., strain BW2S) using a Qiagen DNeasy kit. Partial SSU the two hemimastigotes, bearing in mind that some of the individual genes in
rDNA sequences were PCR-amplified from Spironema cf. multiciliatum and the dataset carry relatively little phylogenetic signal. There was no particular pat-
Spumella sp. BW2S using primers 82F (5′-GAAACTGCGAATGGCTC-3′) tern to the relationships between each hemimastigote and other eukaryotes in the
and 1498R (5′-CACCTACGGAAACCTTGTTA-3′), with annealing tem- remaining 32% of trees. In summary, the single-gene trees indicate that there was
peratures of 58 °C and 55 °C, respectively. A partial Hemimastix SSU rDNA little-to-no contamination from other eukaryotes in the analysed hemimastigote
sequence was PCR-amplified from strain BW2H using exact-match prim- data. Furthermore, the Spironema and Hemimastix sequences always differed in
ers Hemi2-342F (5′-ACTTTCGATTGTAGGATAGA-3′) and Hemi2-1103R these 247 alignments, which confirms that no cross-contamination between the
(5′-AAAACTTGCGATTTCTCTGG-3′) with an annealing temperature of 55 °C. two had carried through to the final dataset.
All amplicons were directly Sanger-sequenced at Génome Québec. The SSU rDNA Phylogenomic analyses. The 351 individual-gene alignments with 104 retained
of Spumella sp. strain BW2S was 99% identical to Spumella strain 187hm (GenBank taxa (see above) were concatenated, and trimmed with BMGE (–m BLOSUM30
accession code: DQ388550). –h 0.42 –g 1), yielding a dataset that consisted of 104 taxa and 93,798 amino acid
The SSU rRNA sequences for the two hemimastigotes were extracted from sites. To enable more-complex analyses, we then excluded 43 phylogenetically
the transcriptome data (see above) and compared to the SSU rDNA sequences redundant taxa—followed by re-trimming with BMGE (as above)—to generate
obtained independently from genomic DNA, to ensure mutual identity (although a 61-taxon dataset with 93,903 amino acid sites. Taxa were selected for reten-
the rDNA sequence of H. kukwesjijk did differ from the transcriptome-derived tion in the 61-taxon dataset such that eukaryote diversity remained reasonably
rRNA sequence in having a 395-bp intron). The transcriptome-derived SSU evenly sampled, and that all major taxa that were included in the 104-taxon dataset
rRNA sequences (and environmental clone AY689797, retrieved from GenBank were still represented. Where there was a choice, species with high gene coverage
via megablast) were then added via profile alignment using MUSCLE32 to a global were retained in preference to species that were more poorly sampled, and short-
eukaryotic alignment of SSU rRNA genes (111 taxa total). Following manual er-branching species were retained over longer-branching species. Phylogenies
inspection of the alignment, poorly aligned sites were masked using Gblocks33 for both datasets were inferred by maximum likelihood using IQ-TREE under
with subsequent manual correction (1,252 sites retained), and a phylogeny was the LG + C60 + F + Γ mixture model, with robustness assessed by ultra-fast
estimated in RAxML under the ‘GTR + Γ’ model34 with a 1,000-replicate bootstrap bootstrap approximation (1,000 replicates). The 61-taxon dataset was also sub-
analysis (Extended Data Fig. 3). jected to a ‘full’ bootstrap analysis with 200 replicates under the PMSF model,
Environmental SSU rRNA and rDNA sequence comparisons. Sequences implemented in IQ-TREE. PMSF is a site-heterogeneous mixture model that can
derived from eukaryotic environmental SSU rRNA and rDNA were acquired closely approximate complex mixture models such as LG + C60 + F + Γ while
from VAMPS35 (V9), TARA Oceans36 (V9), BioMarKs37 (V4), a neotropical soil reducing computational time several-fold51, making full bootstrapping practical
study38 (V4), a high-arctic Fjord water column study39 (V4) and a soil metatran- for our ~60-taxon datasets. The maximum likelihood tree that was inferred for this
scriptome dataset40, and queried in a BLAST41 analysis with the appropriate (V4 dataset under the LG + C60 + F + Γ model (see above) was used as the guide tree
or V9) section of the Spironema and Hemimastix SSU rRNAs, at a 85% identity for the PMSF analysis. The 61-taxon dataset was also subjected to Bayesian analysis
cut-off (top 500 hits). The corresponding short reads from the datasets were first with PhyloBayes52 version 4.1 under the CAT + GTR model53, with default priors
aligned to the eukaryote reference alignment (see above) using PaPaRa42 version and Markov chain Monte Carlo settings. Four independent Markov chain Monte
2.5 and then placed on the SSU rRNA gene tree (Extended Data Fig. 3) using Carlo chains were run for ~10,000 generations. Three chains converged (maximum
pplacer43 version 1.1. Chimeric reads were identified manually with BLAST against difference in posterior probability < 0.13; burn-in = 3,000). Their consensus tree
the Genbank non-redundant nucleotide database and discarded (all cases were shows Hemimastigophora as sister to (other) Diaphoretickes with maximal support
from VAMPS V9 datasets). Reads were also discarded if the top 100 BLAST hits (that is, consistent with the maximum likelihood tree), whereas the unconverged
were all to a single taxonomic group (for example, ciliates). Surviving reads were chain yielded the topology in which Hemimastigophora is sister to the Sar + Telo
assigned to Hemimastigophora if they were placed on a particular branch within nema + Haptophyta + Centrohelida grouping.

© 2018 Springer Nature Limited. All rights reserved.


Letter RESEARCH

Several further sets of analyses were conducted on derivatives of the 61-taxon Paramastix (globular) or Stereonema (elongate but main rows of flagella about
dataset. First, we used a custom script to calculate average tip-to-tip distances half the length of the cell7,58,59). There are three previously described species of
for each taxon and identify ‘long-branching’ outliers (that is, taxa for which the Spironema: Spironema terricola, Spironema goodeyi and Spironema multiciliatum.
average tip-to-tip branch lengths were longer than three standard deviations from The shape and size of our specimens is inconsistent with S. terricola and S. goodeyi,
the centre of the distribution of average branch lengths). Removing the three iden- both of which are very long and thin7. In addition, neither of these species has any
tified outliers (Bodo, Diplonema and Tetrahymena) yielded the ‘58 taxa, no long- posterior flagella. Our cells are similar in shape to S. multiciliatum5. The number
est branches’ (58-nLB) dataset. This was analysed using maximum likelihood, as of flagella in the ‘main row’ and the presence of a few difficult-to-observe flagella
per the main 61-taxa analysis (IQ-TREE with LG + C60 + F + Γ, with 1,000- towards the posterior end are also broadly consistent with a previous account of
replicate ultra-fast bootstrap approximation, and 200 bootstraps using PMSF with S. multiciliatum, in which such posterior flagella were seen in some cells5,7.
the LG + C60 + F + Γ maximum likelihood tree for the 58-nLB dataset as the However, our cells are 23–31 μm in length (mean: 27.4 μm (s.d., 3.45 μm); n = 7;
guide tree). see main text), which is markedly longer than the 18-μm length reported for
Second, we deleted the three most data-poor taxa, each of which had site cov- S. multiciliatum. Thus, we determined that our specimens are similar—but not
erage < 30% (Telonema, Gromia and the picozoan PB58411a), resulting in a ‘58 identical—to S. multiciliatum.
taxa, no data-poor species’ (58-nDP) dataset. This was analysed using maximum Reporting summary. Further information on research design is available in
likelihood as per the main 61-taxa analysis, except that the PMSF bootstrap analysis the Nature Research Reporting Summary linked to this paper.
was based on 100 replicates.
Third, we recoded the main 61-taxon dataset into four distinct categories of Data availability
amino acids (SR4 scheme54), to address possible compositional heterogeneity. Raw reads of Spironema and Hemimastix transcriptomes are deposited in GenBank
The resulting 61-SR4 dataset was analysed with IQ-TREE under a GTR + R6 + F under accession codes SRR6032743 and SRR6032744, respectively. The assem-
model, with 500 real bootstrap replicates. bled Hemimastix and Spironema transcriptomes, 351 individual-gene alignments
Fourth, we used the assignment of per-site rates in IQ-TREE (-wsr flag) for the (104 taxa), concatenated and trimmed alignments and tree-files for the 104-taxon,
main 61-taxon dataset, and progressively removed the fastest-evolving sites in 10 61-taxon, 58-nLB, 58-nDP, 61-SR4 and 61-SFSR datasets, alignments and tree
steps, with approximately 4% of the sites removed in each step. This yielded 10 files for non-universal ancient genes, raw light microscopy and scanning electron
‘stepwise fastest sites removed’ (61-SFSR) datasets. To exclude the influence of the microscopy images, and the SSU rDNA alignment and tree-files have been depos-
position of Hemimastigophora in the guide trees for subsequent PMSF analyses, ited in Dryad (https://doi.org/10.5061/dryad.n5g39d7). The partial SSU rDNA
we deleted the two hemimastigotes from the full dataset and the 10 SFSR data- gene sequence of H. kukwesjijk strain BW2H is deposited in GenBank, under
sets (that is, 11 total) with phyx version 0.155, and pruned these two species from accession code MF682191. This publication has been registered with the ZooBank
the maximum likelihood tree from the 61-taxon dataset. The pruned tree was database (http://zoobank.org/) with the Life Science Identifier urn:lsid:zoobank.
then used as the guide tree to calculate PMSF profiles (‘PMSF-nHEMI’) under org:pub:4BA2A83C-8363-4EBE-A9C7-097CA470F9FB, and the name Hemimastix
LG + C60 + F + Γ. For each of the original 11 datasets (that is, datasets that kukwesjijk has been deposited in Zoobank with the Life Science Identifier urn:l-
included hemimastigotes), we then inferred support for important bipartitions sid:zoobank.org:act:32E12332-A418-40E2-BF4C-F2BFD94BF4CF.
under this LG + C60 + F + Γ PMSF model using a 1,000-replicate, ultra-fast
bootstrap approximation, and plotted these support values against the percentage
31. Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat.
of sites remaining (Extended Data Fig. 8). This method of generating the PMSF
Protoc. 9, 171–181 (2014).
model (PMSF-nHEMI) and evaluating statistical support differs from the main 32. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and
analyses (for example, 61-taxon, 58-nLB or 58-nDP), and the support values cannot high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
be directly compared between these analyses and the 61-SFSR analyses. 33. Castresana, J. Selection of conserved blocks from multiple alignments
Identification of non-universal ancient genes. To search the hemimastigote for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552
(2000).
transcriptome data for gene innovations that potentially originated early in the
34. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and
evolution of crown eukaryotes (and thus may also represent synapomorphies that post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
provide information about the relationships between supergroups), we collated a 35. Huse, S. M. et al. VAMPS: a website for visualization and analysis of microbial
set of gene systems reported in the literature to include genes with widespread—but population structures. BMC Bioinformatics 15, 41 (2014).
not universal—distributions across major eukaryote groups. Specific genes were 36. de Vargas, C. et al. Eukaryotic plankton diversity in the sunlit ocean. Science
selected on the basis of their presence in more than one species-rich ‘supergroup’ 348, 1261605 (2015).
37. BioMarKs Consortium. BioMarKs data portal http://www.biomarks.eu (2011).
of eukaryotes—for example both Obazoa and Amoebozoa (see Supplementary 38. Mahé, F. et al. Parasites dominate hyperdiverse soil protist communities in
Table 3). For this purpose, Metamonada and Discoba were considered distinct Neotropical rainforests. Nat. Ecol. Evol. 1, 0091 (2017).
supergroups. Sequences were retrieved from GenBank or from the literature, and 39. Marquardt, M., Vader, A., Stübner, E. I., Reigstad, M. & Gabrielsen, T. M.
used as BLASTp queries against both hemimastigote transcriptomes, translated into Strong seasonality of marine microbial eukaryotes in a high-arctic fjord
amino acid sequences using a custom script (default genetic code). Where genes (Isfjorden, in West Spitsbergen, Norway). Appl. Environ. Microbiol. 82,
1868–1880 (2016).
were not identified with BLASTp, hidden Markov model profiles were obtained
40. Geisen, S. et al. Metatranscriptomic census of active protists in soils. ISME J. 9,
either from the PFAM database or the literature (as indicated in Supplementary 2178–2190 (2015).
Table 3), or were built de novo from the alignments in the corresponding literature 41. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local
using hmmbuild, and then scanned for in both hemimastigote transcriptomes alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
using hmmscan (both hmmbuild and hmmscan from the Hmmer-3.1b2 pack- 42. Berger, S. A. & Stamatakis, A. Aligning short reads to reference alignments and
age56). Genes that were retrieved in only one of the hemimastigote transcriptomes trees. Bioinformatics 27, 2068–2075 (2011).
43. Matsen, F. A., Kodner, R. B. & Armbrust, E. V. pplacer: linear time maximum-
were used as BLASTp queries against the other. Hemimastigote candidate ortho- likelihood and Bayesian phylogenetic placement of sequences onto a fixed
logues were verified by reciprocal BLASTp against the nr database, and—where reference tree. BMC Bioinformatics 11, 538 (2010).
appropriate—domain annotation databases (InterProScan and SMART), and 44. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for
then added to pre-existing alignments from corresponding references (as shown Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
in Supplementary Table 3) via profile alignment using MUSCLE in Seaview version 45. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-seq data
without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
4.632,57. Where phylogenies were necessary to further confirm identity (particularly 46. Brown, M. W. et al. Phylogenomics places orphan protistan lineages in a novel
in the case of multigene families), the alignments were trimmed using BMGE eukaryotic super-group. Genome Biol. Evol. 10, 427–433 (2018).
version 1.148 (-m BLOSUM30), and phylogenies estimated in IQ-TREE version 47. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software
1.5.549 under the LG4X model. An alignment for HPS1 was not available in the version 7: improvements in performance and usability. Mol. Biol. Evol. 30,
original publication and was instead assembled from sequences from GenBank and 772–780 (2013).
48. Criscuolo, A. & Gribaldo, S. BMGE (Block Mapping and Gathering with Entropy):
publicly available transcriptomes, and aligned via MAFFT-L-INS-i47. Because of
a new software for selection of phylogenetic informative regions from multiple
the large size of the myosin gene family and the level of divergence between various sequence alignments. BMC Evol. Biol. 10, 210 (2010).
paralogues, myosin homologues were instead aligned with MAFFT-E-INS-i and 49. Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and
trimmed less conservatively (BMGE; -m BLOSUM30 -b 2), with the corresponding effective stochastic algorithm for estimating maximum-likelihood phylogenies.
phylogeny estimated under the LG + C60 + F + Γ model. Mol. Biol. Evol. 32, 268–274 (2015).
Identification of Spironema cf. multiciliatum. The cells we discuss as Spironema 50. Minh, B. Q., Nguyen, M. A. T. & von Haeseler, A. Ultrafast approximation for
phylogenetic bootstrap. Mol. Biol. Evol. 30, 1188–1195 (2013).
cf. multiciliatum have an elongate shape (Fig. 1a, Extended Data Fig. 1a) and the 51. Wang, H. C., Minh, B. Q., Susko, E. & Roger, A. J. Modeling site heterogeneity with
‘main row’ of flagella is restricted to the anterior portion. These features iden- posterior mean site frequency profiles accelerates accurate phylogenomic
tify this organism with Spironema rather than Hemimastix (broad and flattened), estimation. Syst. Biol. 67, 216–235 (2018).

© 2018 Springer Nature Limited. All rights reserved.


RESEARCH Letter

52. Lartillot, N., Lepage, T. & Blanquart, S. PhyloBayes 3: a Bayesian software 56. Eddy, S. R. Accelerated profile HMM searches. PLOS Comput. Biol. 7, e1002195
package for phylogenetic reconstruction and molecular dating. Bioinformatics (2011).
25, 2286–2288 (2009). 57. Gouy, M., Guindon, S. & Gascuel, O. SeaView version 4: A multiplatform
53. Lartillot, N. & Philippe, H. A Bayesian mixture model for across-site graphical user interface for sequence alignment and phylogenetic tree building.
heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21, Mol. Biol. Evol. 27, 221–224 (2010).
1095–1109 (2004). 58. Foissner, W. & Foissner, I. in An Illustrated Guide to the Protozoa 2nd edn (eds
54. Susko, E. & Roger, A. J. On reduced amino acid alphabets for phylogenetic Lee, J. J. et al.) 1185–1186 (Society of Protozoologists and Allen Press,
inference. Mol. Biol. Evol. 24, 2139–2150 (2007). Lawrence, 2002).
55. Brown, J. W., Walker, J. F. & Smith, S. A. Phyx: phylogenetic tools for unix. 59. Zolffel, M. & Skibbe, O. Rediscovery of the multiflagellated protist Paramastix
Bioinformatics 33, 1886–1888 (2017). conifera Skuja 1948 (Protista incertae sedis). Nova Hedwigia 65, 443–452 (1997).

© 2018 Springer Nature Limited. All rights reserved.


Letter RESEARCH

Extended Data Fig. 1 | Light micrographs of studied hemimastigotes. of different cells showing the capitulum (e), mid-body region with
a–m, Spironema cf. multiciliatum (a) and Hemimastix kukwesjijk (b–m) rotationally symmetrical plate overlap (f) and the posterior (g) with
differential interference contrast micrographs of live cells. a, Two views of radial arrangement of the posterior-most flagella. h–j, Pseudoseries that
a Spironema cf. multiciliatum cell, with inset that details the posterior end. illustrates the feeding process, showing the progression of prey-ingestion
Note the nucleus (marked by ‘n’), the detail of one of the posterior flagella stages. Note the widening capitulum and beginning of formation of the
(marked by an arrow, in the inset) and small contractile vacuole (cv, in phagocytic vacuole. k, Same cell as in j, showing the anterior flagella
inset), as well as posterior tail (line in inset). b, c, Optical sections through curving forward to surround prey (seen especially in early feeding).
one H. kukwesjijk cell, detailing the notches from which flagella emerge l, m, Dividing cells, showing the diagonal symmetry of short new rows
(arrowheads), a section through the capitulum (marked with a ‘c’) and a (nr) and longer old rows (or) of flagella, as well as the daughter nuclei (n).
conspicuous contractile vacuole in the cell posterior (shown in b). Scale bar, 10 μm.
d, Surface view of one of the two thecal plates. e–g, Optical cross-sections

© 2018 Springer Nature Limited. All rights reserved.


RESEARCH Letter

Extended Data Fig. 2 | Scanning electron microscopy images of by the fixation process) along margin of the capitulum (compare to
H. kukwesjijk. a, Feeding cell, general view (anterior to left; note the prey undischarged extrusomes in Fig. 1d). d, Dividing cells, with the left-most
item attached to capitulum). b, Close-up of anterior end showing ingestion cell clearly showing the old row of full-length flagella (or) and the new row
in progress at the capitulum. c, Discharged extrusomes (ex; triggered with short flagella (nr). Scale bars, 5 μm (a, d), 2 μm (b, c).

© 2018 Springer Nature Limited. All rights reserved.


Letter RESEARCH

Extended Data Fig. 3 | SSU rDNA phylogeny of eukaryotes. Phylogeny included and marked with an asterisk. The numbers on branches show
inferred from 111 taxa and 1,252 sites under the GTR + Γ model in bootstrap percentages (1,000 replicates; values below 50% not shown).
RAxML. Hemimastigophora—including H. kukwesjijk and Spironema Branches in grey are half their original length. This tree was the reference
cf. multiciliatum from this study—are shown in red. Colours of other phylogeny for pplacer analyses shown in Fig. 2. Scale bar denotes 0.1
sequence names correspond to the same taxonomic groupings as in Fig. 3. expected substitutions per site.
The sequence of Spumella sp. strain BW2S, the prey for H. kukwesjijk, is

© 2018 Springer Nature Limited. All rights reserved.


RESEARCH Letter

Extended Data Fig. 4 | Unrooted phylogeny of eukaryotes, 104 taxa 100% support. The Carpediemonas branch is shown reduced by 1/3 of
dataset. Phylogeny inferred from 351 genes, using maximum likelihood the original length for display purposes. Scale bar denotes 0.1 expected
under the LG + C60 + F + Γ model. The numbers on branches show substitutions per site.
ultrafast bootstrap approximation percentages, with filled circles denoting

© 2018 Springer Nature Limited. All rights reserved.


Letter RESEARCH

Extended Data Fig. 5 | Unrooted phylogeny using 58-nLB dataset. replicates), then ultrafast bootstrap approximation percentages (1,000
Phylogeny inferred from 351 genes, using maximum likelihood under replicates). Filled circles denote 100% support with both methods. Scale
the LG + C60 + F + Γ model. The numbers on branches show PMSF bar denotes 0.1 expected substitutions per site.
bootstrap percentages (bootstrap support PMSF; 200 true bootstrap

© 2018 Springer Nature Limited. All rights reserved.


RESEARCH Letter

Extended Data Fig. 6 | Unrooted phylogeny using 58-nDP dataset. replicates), then ultrafast bootstrap approximation percentages (1,000
Phylogeny inferred from 351 genes, using maximum likelihood under replicates). Filled circles denote 100% support with both methods. The
the LG + C60 + F + Γ model. The numbers on branches show PMSF branches leading to Bodo, Diplonema and Tetrahymena are shown reduced
bootstrap percentages (bootstrap support PMSF; 100 true bootstrap by 1/3. Scale bar denotes 0.1 expected substitutions per site.

© 2018 Springer Nature Limited. All rights reserved.


Letter RESEARCH

Extended Data Fig. 7 | Unrooted phylogeny using 61-SR4 dataset of 61 replicates). Filled circles represent 100% support. The branches leading to
taxa. Phylogeny inferred from 351 genes, with amino acids recoded as Bodo, Diplonema and Tetrahymena are shown reduced by 1/3. Scale bar
four states, using maximum likelihood under the GTR + R6 + F model. denotes 0.1 expected substitutions per site.
The numbers on branches show bootstrap percentages (500 true bootstrap

© 2018 Springer Nature Limited. All rights reserved.


RESEARCH Letter

Extended Data Fig. 8 | Summary of 61-SFSR analysis. Chart follows percentages (1,000 replicates) inferred using maximum likelihood under
the support for several important bipartitions with the sequential the LG + C60 + F + Γ-derived PSMF model using a guide tree pruned
removal of the fastest-evolving sites from the 61-taxon, 351-gene of hemimastigotes (PMSF-nHEMI, see Methods); these values are not
dataset. The support values are ultra-fast bootstrap approximation directly comparable to those from the other illustrated analyses.

© 2018 Springer Nature Limited. All rights reserved.


nature research | reporting summary
Corresponding author(s): Alastair G.B. Simpson

Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.

Statistical parameters
When statistical analyses are reported, confirm that the following items are present in the relevant location (e.g. figure legend, table legend, main
text, or Methods section).
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
An indication of whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of all covariates tested


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons
A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND
variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)

For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated

Clearly defined error bars


State explicitly what error bars represent (e.g. SD, SE, CI)

Our web collection on statistics for biologists may be useful.

Software and code


Policy information about availability of computer code
Data collection Transcriptome: Illumina sequence reads of transcriptome were quality-checked and trimmed with: Trimmomatic v0.32, and assembled
with: Trinity v. 2.0.2. Sequencing cross-contamination removed using custom script by Martin Kolisko (Czech Academy of Sciences),
available on Github (https://github.com/kolecko007/decros)]

Data analysis (A) SSU rRNA/DNA data: (Profile) Alignment: MUSCLE in Seaview v 4.6, Site selection: Gblocks (with manual correction) in Seaview 4.6,
Maximum likelihood phylogenetic analysis: RAxML v 8.2.6. Alignment of short environmental sequence reads to full alignment: PaPaRa v
2.5, Phylogenetic placement of short reads: PPlacer v 1.1.
(B) Phylogenomic Datasets: Alignment: MAFFT v 7.0, Site selection: BMGE v 1.0; Maximum likelihood phylogenetic analyses: IQ-TREE v
1.4.4, Bayesian Inference of Phylogeny: PhyloBayes v 4.1. Identification of outlier long-branch taxa for a derivative phylogenetic analysis
done with a short script available on request (to L. Eme). Taxon removal from datasets as part of derivative fast-site removal phylogenetic
analysis with Phyx v.0.1
April 2018

(C) Gene identity analysis: HMM building and searches: Hmmer-3.1b2. Alignment: MUSCLE in Seaview v. 4.6 (profile alignment) or MAFFT
v 7.0 (full alignment). Where Phylogenies performed: Site selection: BMGE v 1.1 (full alignment), phylogenetic analysis: IQ-TREE v 1.5.5.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers
upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

1
Data

nature research | reporting summary


Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A list of figures that have associated raw data
- A description of any restrictions on data availability
Raw reads of Spironema and Hemimastix transcriptomes are deposited on Genbank under accession IDs SRR6032743 and SRR6032744, respectively. The assembled
Hemimastix and Spironema transcriptomes, 351 individual gene alignments (104-taxa), concatenated and trimmed alignments and tree-files for the 104-taxa, 61-
taxa, 58-nLB, 58-nDP, 61-SR4 and 61-SFSR datasets, alignments and tree files for non-universal ancient genes, raw LM and SEM images, and the SSU rDNA alignment
and tree-files are deposited on Datadryad doi:10.5061/dryad.n5g39d7. The partial SSU rDNA gene sequence of Hemimastix kukwesjijk strain BW2H is deposited on
Genbank under accession ID MF682191. Hemimastix kukwesjijk has been deposited in the ZooBank database (http://zoobank.org/) with Life Science Identifier
urn:lsid:zoobank.org:pub:4BA2A83C-8363-4EBE-A9C7-097CA470F9FB.

Field-specific reporting
Please select the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/authors/policies/ReportingSummary-flat.pdf

Life sciences study design


All studies must disclose on these points even when the disclosure is negative.
Sample size NOT RELEVANT TO STUDY - There were no experimental groups in this study. [The only point where we have sample sizes is when
determining lengths of our cells, which is part of the descriptive component of the study (i.e. not further analyses used this data). n = 61 for
size for the cultivated strain of Hemimastix, being 2x the standard normally employed for measurements, n=7 for uncultivated Spironema,
which is all the cells we could isolate and image from the sample.]

Data exclusions A small number of individual genes were excluded from inclusion in the phylogenomic datasets, if they were determined to be likely
contaminations, paralogs or similar spurious data. This determination was made from single-gene phylogenetic trees for each of the genes
included. This is reported in the methods and is a standard and necessary procedure for eukaryote-wide phylogenomic analyses.

Replication NOT RELEVANT TO STUDY - The analyses were all analysed from fixed sets of sequence data and, for all intents and purposes, are intrinsically
reproducible. Convergence behaviour of the MCMC Bayesian phylogenetic analysis is reported in the methods.

Randomization NOT RELEVANT TO STUDY - There were no experimental groups to assign individuals to at any point in the study

Blinding NOT RELEVANT TO STUDY - There were no experimental groups to assign individuals to at any point in the study

Reporting for specific materials, systems and methods

Materials & experimental systems Methods


n/a Involved in the study n/a Involved in the study
Unique biological materials ChIP-seq
Antibodies Flow cytometry
Eukaryotic cell lines MRI-based neuroimaging
Palaeontology
Animals and other organisms
April 2018

Human research participants

Unique biological materials


Policy information about availability of materials
Obtaining unique materials Two new microbial eukaryote (protist) cultures (Hemimastix kukwesjijk BW2H; Spumella sp. BW2S) were newly established

2
Obtaining unique materials during this study (and are described in the methods). They are available from the authors on request (as a matter of basic
scientific ethics). Note that these are not 'eukaryote cell lines' in the normal sense of the term, thus we have marked "n/a" for

nature research | reporting summary


'Eukaryotic cell lines'

Animals and other organisms


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research
Laboratory animals did not involve laboratory animals

Wild animals did not involve wild animals

Field-collected samples A single small field-collection of soil was the source of the cells and cultures examined in this study. The sample was kept
hydrated at room temperature and ambient light for 4 weeks prior to, and during, the isolations and observations of unicellular
protists reported in the study

April 2018

You might also like