0% found this document useful (0 votes)
16 views11 pages

2014 Article 6149

The document compares different RNA sequencing library preparation methods for gene expression profiling using fresh frozen and formalin-fixed paraffin-embedded tumor samples. It finds that Ribo-Zero RNA depletion provides equivalent rRNA removal and gene quantification as polyA selection, and both Ribo-Zero and DSN depletion methods generate consistent results using FFPE samples.

Uploaded by

letycia469
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views11 pages

2014 Article 6149

The document compares different RNA sequencing library preparation methods for gene expression profiling using fresh frozen and formalin-fixed paraffin-embedded tumor samples. It finds that Ribo-Zero RNA depletion provides equivalent rRNA removal and gene quantification as polyA selection, and both Ribo-Zero and DSN depletion methods generate consistent results using FFPE samples.

Uploaded by

letycia469
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Zhao et al.

BMC Genomics 2014, 15:419


http://www.biomedcentral.com/1471-2164/15/419

RESEARCH ARTICLE Open Access

Comparison of RNA-Seq by poly (A) capture,


ribosomal RNA depletion, and DNA microarray for
expression profiling
Wei Zhao1,2, Xiaping He2,3, Katherine A Hoadley2,3, Joel S Parker2,3, David Neil Hayes3,5 and Charles M Perou1,2,3,4*

Abstract
Background: RNA sequencing (RNA-Seq) is often used for transcriptome profiling as well as the identification of
novel transcripts and alternative splicing events. Typically, RNA-Seq libraries are prepared from total RNA using poly
(A) enrichment of the mRNA (mRNA-Seq) to remove ribosomal RNA (rRNA), however, this method fails to capture
non-poly(A) transcripts or partially degraded mRNAs. Hence, a mRNA-Seq protocol will not be compatible for use
with RNAs coming from Formalin-Fixed and Paraffin-Embedded (FFPE) samples.
Results: To address the desire to perform RNA-Seq on FFPE materials, we evaluated two different library preparation
protocols that could be compatible for use with small RNA fragments. We obtained paired Fresh Frozen (FF) and
FFPE RNAs from multiple tumors and subjected these to different gene expression profiling methods. We tested
11 human breast tumor samples using: (a) FF RNAs by microarray, mRNA-Seq, Ribo-Zero-Seq and DSN-Seq
(Duplex-Specific Nuclease) and (b) FFPE RNAs by Ribo-Zero-Seq and DSN-Seq. We also performed these different
RNA-Seq protocols using 10 TCGA tumors as a validation set.
The data from paired RNA samples showed high concordance in transcript quantification across all protocols and
between FF and FFPE RNAs. In both FF and FFPE, Ribo-Zero-Seq removed rRNA with comparable efficiency as
mRNA-Seq, and it provided an equivalent or less biased coverage on gene 3′ ends. Compared to mRNA-Seq where
69% of bases were mapped to the transcriptome, DSN-Seq and Ribo-Zero-Seq contained significantly fewer reads
mapping to the transcriptome (20-30%); in these RNA-Seq protocols, many if not most reads mapped to intronic
regions. Approximately 14 million reads in mRNA-Seq and 45–65 million reads in Ribo-Zero-Seq or DSN-Seq were
required to achieve the same gene detection levels as a standard Agilent DNA microarray.
Conclusions: Our results demonstrate that compared to mRNA-Seq and microarrays, Ribo-Zero-Seq provides
equivalent rRNA removal efficiency, coverage uniformity, genome-based mapped reads, and consistently high
quality quantification of transcripts. Moreover, Ribo-Zero-Seq and DSN-Seq have consistent transcript quantification
using FFPE RNAs, suggesting that RNA-Seq can be used with FFPE-derived RNAs for gene expression profiling.
Keywords: RNA sequencing, FFPE, RNA depletion, Ribo-zero, Gene expression, Microarray

* Correspondence: [email protected]
1
Curriculum in Bioinformatics and Computational Biology, The University of
North Carolina at Chapel Hill, Chapel Hill NC 27599, USA
2
Department of Genetics, The University of North Carolina at Chapel Hill,
Chapel Hill NC 27599, USA
Full list of author information is available at the end of the article

© 2014 Zhao et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain
Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article,
unless otherwise stated.
Zhao et al. BMC Genomics 2014, 15:419 Page 2 of 11
http://www.biomedcentral.com/1471-2164/15/419

Background and the number of samples tested for each protocol. Ana-
The development of massively parallel sequencing for use lytical comparisons were focused on several features in-
in gene expression profiling is known as RNA-sequencing cluding rRNA depletion efficiency, genome alignment
(RNA-Seq). RNA-Seq has had an enormous impact on profile, transcriptome coverage, transcript quantifica-
gene expression studies. Compared to hybridization-based tion accuracy and reproducibility, gene expression pat-
technologies like DNA microarrays, it provides consistent terns and differential gene expression, as well as coverage
quantification and manifests its superiority in terms of the of annotated genes at different sequencing depths.
dynamic range, sampling depth, and has independence
from pre-existing sequence information [1,2]. RNA-Seq rRNA depletion efficiency
can be used for traditional transcriptome profiling [3,4], The efficiency of rRNA removal is a key factor to maxi-
identification of novel transcripts [5], identification of mize reads mapping to transcripts, because if left alone,
expressed SNPs[6,7], alternative splicing, and for the de- rRNAs make up >80-90% of the total RNA of an un-
tection of gene fusion events [8-11]. enriched sample [18]. Due the nature of rRNA sequences,
To allow for mRNA/gene detection, highly abundant many rRNA short reads will produce poor alignments;
ribosomal RNAs (rRNAs) must be removed from total hence, the estimation of absolute abundance of rRNA
RNA before sequencing. One standard solution is to en- based on whole genome alignment tends to underestimate
rich for the polyadenylated (poly(A)) RNA transcripts rRNA amounts. Thus we evaluated the relative level of
(so called mRNA-Seq) with oligo (dT) primers, similar rRNA components across protocols by comparing the
to how DNA microarrays are primed; however, this me- levels to those observed in mRNA-Seq. Ribo-Zero-Seq re-
thod eliminates all non-poly(A) RNAs in addition to duced rRNA levels to a similar order of magnitude as
rRNAs. Recent studies suggested that certain non-polyA mRNA-Seq in both FF and FFPE RNA, while the rRNA
RNAs, either non-coding or protein coding, are func- fraction in DSN-Seq libraries were significantly higher
tionally important [12-15]. Moreover, mRNA-Seq poorly (p < 0.001) and with greater variation, particularly within
captures partially degraded mRNAs, hence it is not an the FFPE samples (Table 1). Consistent with the analysis
optimal method to use when the starting materials are of the UNC dataset, Ribo-Zero-Seq provided the same
from Formalin-Fixed and Paraffin-Embedded (FFPE) rRNA removal efficiency as mRNA-Seq in the TCGA
samples, because the RNAs from FFPE are degraded to a samples; the level of rRNA reduction observed here for
small average size [16]. To overcome these challenges, the Ribo-Zero-Seq protocol was similar to that reported
several rRNA depletion protocols have been developed. by the company that makes the Ribo-Zero kit (data not
The Ribo-Zero method removes rRNA through hybri- shown).
dization capture of rRNA followed by binding to mag-
netic beads for subtraction. Another method involves Genome alignment profile
Duplex-Specific Nuclease (DSN) degradation by the C0t- The precision of RNA-Seq gene quantification is directly
kinetics-based normalization method to deplete abundant dependent on the number of reads that are mapped to
sequences that reanneal quickly, such as those derived transcripts, thus we first assessed the fraction of reads
from the highly abundant rRNAs and tRNAs [17]. In this aligning to the reference human genome UCSC hg19
study, we examined rRNA-depleted libraries from total (Table 1). In FF samples, mRNA-Seq and Ribo-Zero-Seq
RNA of fresh-frozen (FF) and FFPE samples sequenced by provided comparable percentage of nucleotide bases
mRNA-Seq, Ribo-Zero-Seq and DSN-Seq and compared mapping to the genome (94.0%, 93.8%), while DSN-Seq
these results across methods and with conventional DNA aligned a smaller number (85.5%). In FFPE samples,
microarrays. Ribo-Zero-Seq and DSN-Seq both had good perform-
ance in alignment on average (81.5% in Ribo-Zero-Seq-
Results FFPE, 93.5% in DSN-Seq-FFPE); TCGA samples had a
To rigorously evaluate the feasibility of reproducible gene similar result for both FF and FFPE (Table 1). Compared
expression profiling using RNA from clinically relevant to FF, the FFPE samples tended to exhibit a greater vari-
FFPE materials, we collected FFPE and fresh-frozen (FF) ation in the% aligned, most likely related to more vari-
tumor RNAs for matched sets of tumors from two dif- able quality of FFPE RNAs.
ferent sources (UNC and TCGA). Most tumors were sub-
jected to gene expression profiling using six different Transcriptome coverage
methods that included: 1) Agilent DNA microarrays using The coverage of the transcriptome directly affects the
FF RNA, 2) mRNA-Seq using FF RNA, 3) Ribo-Zero-Seq accuracy of transcript abundance estimation and the
using FF RNA, 4) DSN-Seq using FF RNA, 5) Ribo-Zero- sensitivity of transcript detection, which are two critical
Seq using FFPE RNA, and 6) DSN-Seq using FFPE RNA; features of all gene expression studies. Therefore, we
see Figure 1 for a comparison of each RNA-Seq protocol evaluated two features of the transcriptome coverage: (a)
Zhao et al. BMC Genomics 2014, 15:419 Page 3 of 11
http://www.biomedcentral.com/1471-2164/15/419

(A)
mRNA-Seq Ribo-Zero-Seq DSN-Seq

Purified Total RNA Purified Total RNA Purified Total RNA

RNA extraction:
Poly-A Selection
Hybridization/bead capture

RNA Fragmentation* RNA Fragmentation* RNAFragmentation*

cDNA Synthesis cDNA Synthesis cDNA Synthesis

Adapter Ligation & PCR Adapter Ligation & PCR Adapter Ligation & PCR

DSN Normalization

* RNA Fragmentation only applies to fresh-frozen samples.

(B)

Sample Agilent DNA


Tissuetype mRNA-Seq Ribo-Zero-Seq DSN-Seq
source microarray

Fresh-frozen 11 11 10 11
UNC
FFPE 8 4

Fresh-frozen 10 6 0
TCGA
FFPE 10+8replicates 10

Figure 1 Schematic overview of the rRNA removal protocols and list of samples tested. (A) mRNA-Seq, Ribo-Zero-Seq and DSN-Seq library
preparation protocols are shown, with the key steps to remove the rRNA from the library show in italics. The full protocol was applied to the
fresh-frozen (FF) samples, and a similar alternative protocol was applied to FFPE samples (omitting steps marked as *). (B) The list of samples
tested by each RNA-Seq library protocol and their source.

relative coverage of exons, introns, and intergenic re- 31.6% in mRNA-Seq to 62.5% in DSN-Seq and Ribo-Zero-
gions, and (b) uniformity of transcript coverage. Seq. In FFPE samples, DSN-Seq and Ribo-Zero-Seq pro-
vided similar coverage profiles, where ~20% of bases were
(a) Relative coverage of exons, introns, and intergenic mapped to transcriptome and >60% to intronic or inter-
regions genic regions. These results were concordant with that ob-
In FF samples, bases mapping to transcripts (i.e. cod- served in the TCGA sample set (Figure 2B).
ing and UTR regions) constituted 62.3% total bases in We further investigated the coverage across individual
mRNA-Seq, while a marked reduction was observed in genes (Additional file 1: Figure S1A, GATA3 as an ex-
the two rRNA-depletion protocols (31.5% in Ribo-Zero- ample). In mRNA-Seq, most reads mapped almost exclu-
Seq and 22.7% in DSN-Seq, Figure 2A). Conversely, bases sively to exons, and the coverage of intronic regions was
mapping to intronic and intergenic regions increased from low and comparable to the intergenic background. In
Zhao et al. BMC Genomics 2014, 15:419 Page 4 of 11
http://www.biomedcentral.com/1471-2164/15/419

Table 1 Analysis of performance for multiple RNA-Seq methods


mRNA-Seq RiboZero-Seq DSN-Seq RiboZero-FFPE DSN-FFPE
UNC dataset
Sample size 11 11 10 8 4
% rRNA relative to mRNA-seq 1 5.04 116 7.14 585
(1–1) (1.42-8.66) (78.9-154) (3.48-10.8) (−347-1,517)
% Aligned bases 94 93.8 85.5 81.5 93.5
(91.5-96.5) (92–95.5) (82.6-88.4) (71–92) (92.2-94.8)
Median CV coverage 0.533 0.525 0.56 0.744 0.929
(0.506-0.56) (0.505-0.545) (0.549-0.57) (0.713-0.775) (0.814-1.04)
Median 5′ to 3′ bias 0.27 0.64 0.209 0.356 0.242
(0.189-0.35) (0.493-0.788) (0.143-0.275) (0.285-0.427) (0.0329-0.451)
Pearson correlation to microarray 0.851 0.832 0.855 0.636 0.7
(0.825-0.878) (0.809-0.854) (0.84-0.871) (0.601-0.671) (0.628-0.771)
TCGA dataset
Sample size 10 6 NA 18 10
% rRNA relative to mRNA-seq 1 11.2 NA 0.935 41.7
(1–1) (1.51-20.9) (0.631-1.24) (22.1-61.3)
% Aligned bases 96.4 95.0 NA 93.4 93.2
(95.4-97.5) (93.9-96.2) (91.6-95.2) (90.7-95.8)
Median CV coverage 0.534 0.478 NA 0.83 0.953
(0.517-0.551) (0.458-0.499) (0.791-0.869) (0.896-1.01)
Median 5′ to 3′ bias 0.309 0.46 NA 0.417 0.157
(0.244-0.374) (0.37-0.551) (0.253-0.581) (0.0856-0.229)
Five different analyses were performed in order to assess the capabilities of the different RNA-seq protocols. These included: 1) % rRNA relative to mRNA-Seq;
2) % Aligned bases; 3) Median CV coverage; 4) Median 5′ to 3′ bias; 5) The Pearson correlation coefficient between the RNA-Seq libraries methods and the same
samples assayed by DNA microarray in UNC dataset.

contrast, in Ribo-Zero-Seq and DSN-Seq there was a [19], while DSN-Seq-FFPE had the highest CV among all
more continuous coverage of both exons and introns, al- protocols.
though the coverage of intergenic regions was more Another measure of transcript coverage is the vari-
similar to what was seen with mRNA-Seq. This unique ation at 5′ and 3′ ends. We evaluated the ratio of co-
profile suggests that the rRNA depletion protocol may verage at the 5′ end relative to the 3′ end for the 1000
capture pre-mRNAs in addition to mature mRNAs. To most highly expressed transcripts (Table 1). Previous
test this hypothesis, we examined the pile-up profile of a studies have shown that the poly(A)-capture strategy
few individual genes and identified reads that spanned shows substantially more reads from the 3′ ends of
exon-intron boundaries in the Ribo-Zero-Seq and DSN- transcripts. Our analysis revealed that on FF, Ribo-Zero-
Seq protocols (Additional file 1: Figure S1B, see red ar- Seq provided less biased 5′-to-3′ coverage ratio than
rows for spanning reads). mRNA-Seq (p < 0.001), while DSN-Seq made no signi-
ficant improvement. In FFPE samples, both protocols
(b) Uniformity of transcript coverage performed similar as mRNA-Seq with respect to 5′-
We next determined the evenness of transcript coverage to-3′ bias.
by comparing the median coefficient of variation (CV)
for the read coverage of the 1000 most highly expressed Transcript quantification and reproducibility
transcripts (Table 1). In FF libraries, mRNA-Seq and RNA-Seq poly(A) enrichment strategies yield an accu-
Ribo-Zero-Seq had significantly lower CV than DSN-Seq rate and reproducible measurement of transcript abun-
(mRNA-Seq: p < 0.001, Ribo-Zero-Seq: p = 0.002), indi- dance with a wide dynamic range [1,4,20,21]. Given the
cating a more uniform coverage across the full length of advantages of profiling multiple types of RNA species
transcripts. In the FFPE libraries, there was an increase (i.e. mRNAs, lincRNAs, snoRNAs, etc.), it is critical to
in CV in both protocols. Ribo-Zero-Seq-FFPE had slightly evaluate the performance of mRNA quantification in
higher variation than the result reported in Adiconis et al. total RNA-Seq protocols. To determine the possible
Zhao et al. BMC Genomics 2014, 15:419 Page 5 of 11
http://www.biomedcentral.com/1471-2164/15/419

(A) (B)
1.0 6.02% 1.0 3.58% 4.98%
6.22% 6.52% 6.57% 6.76%
(2.54%−4.62%) (3.83%−6.14%)
(3.51%−8.53%) (4.48%−7.96%) 14.5% (5.23%−7.81%) (4.76%−8.38%) (4.22%−9.3%)
18.5%
(11.6%−17.4%) 14.2%
(7.98%−29%)
17.8% (11.7%−16.7%)
(15.5%−20.1%) 22.5% 25.4% 25.4%
0.8 (20.2%−24.9%) 0.8 6.46% (21.9%−29%)
34.9% (22.3%−28.5%)
(5.51%−7.42%) 41.9%
(31.5%−38.4%) 27.5% 20.1%
13.8% (37.3%−46.5%)
(25.2%−29.7%) (17.8%−22.4%)
(12.6%−15.1%)

0.6 0.6

27.4% 44.1%
52.3% 16.5% 47.2%
(22.5%−32.2%) 35.3% (40%−48.2%)
(50.1%−54.6%) (13.1%−19.9%) (43.1%−51.3%)
0.4 (33.4%−37.2%) 43.8% 0.4 75.7%
(36.3%−51.3%) (73.7%−77.8%)
62.3%
(59.4%−65.3%)

0.2 31.5% 0.2 36.6%


(30.2%−43%)
(28.4%−34.5%) 22.7% 23.9%
18.6% 20.7%
(18%−27.4%) 17.6% (22.5%−25.3%)
(15.7%−21.6%) (16.7%−24.6%)
(14.9%−20.2%)
0.0 0.0
Total 276,022,089 370,713,695 194,050,441 244,462,167 319,752,764 Total 172,813,162 347,357,453 214,333,286 151,443,151
mRNA−Seq Ribo-Zero-Seq DSN-Seq Ribo-Zero−FFPE DSN−FFPE mRNA−Seq Ribo-Zero-Seq Ribo-Zero−FFPE DSN−FFPE
(n=11) (n=11) (n=10) (n=8) (n=4) (n=10) (n=6) (n=18) (n=10)

Unaligned Intergenic Intronic Coding+UTR


Figure 2 Genome alignment profiles. The percentage of nucleotide bases mapping to three different regions of the genome: exonic/protein
coding and UTR (green), intronic (yellow), intergenic (red), and the percentage of unmapped bases (purple). The data is shown separately for the
UNC (A) and TCGA (B) datasets.

concordance of RNA-Seq with data generated by older Zero-Seq on FFPE were less correlated with FF mRNA-
genomic profiling platforms, we compared the gene ex- Seq (>0.8), but still higher than the correlation observed
pression levels of RNA-Seq data with that of Agilent with microarrays. The two rRNA depletion protocols
DNA microarray data that were assayed using the same were the most highly correlated in both FF and FFPE
RNAs. With specific and standard gene filtering criteria samples (Pearson correlation 0.961 in FF and 0.934 in
[22], we detected 16,975 expressed Entrez genes by cus- FFPE). The correlation plots for an individual sample
tom Agilent 244,000 feature microarrays, with 15,206 (breast tumor 020678B) are shown in Figure 3C.
genes detected by both microarray and RNA-Seq across Additional quality assessments were made on the TCGA
our paired samples. In FF samples, gene abundance mea- dataset, to account for the fact that a much smaller set of
surements by all protocols of RNA-Seq were highly cor- reads were mapped to transcriptome in RNA depletion
related with the microarray data (Pearson > 0.8, Table 1). protocols. We generated eight technical replicates with
In FFPE samples, RNA-Seq measurements were lower the Ribo-Zero-Seq-FFPE protocol to balance the total
but also significantly correlated with FF microarray number of transcriptome reads for the comparison with
(Pearson ~0.7, Table 1), which is at a level similar to that FF mRNA-Seq. The assessment of technical reproduci-
observed when comparing concordance between Agilent bility suggested that these FFPE replicates were indis-
and Affymetrix microarrays [23]. tinguishable (Pearson =0.991). The correlation between
We next examined the correlation of transcript abun- Ribo-Zero-Seq on FF and FFPE as well as between Ribo-
dance across the different RNA-Seq protocols. There Zero-Seq-FFPE replicate pairs has also been confirmed in
was greater concordance and fewer outliers than when Norton et al. [24].
compared to the microarray data (Figure 3A and B). Lastly, we applied Deming regression to estimate a sta-
Among FF tissues, the correlation was >0.9 for all pair- tistically unbiased slope to determine the relative sen-
wise, sample-matched comparisons. DSN-Seq and Ribo- sitivity of protocol pairs (Figure 3D). A slope of 1
Zhao et al. BMC Genomics 2014, 15:419 Page 6 of 11
http://www.biomedcentral.com/1471-2164/15/419

(A) (B)
1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0

# samples 10 11 10 4 8 4 4 8 # samples 10 10 10 6 6

PE
PE
PE

PE
PE
ero

PE

ero
ero

PE
PE

PE
q
Se

FF
-FF
FF

FF
-FF
FF

FF
FF
o-Z

o-Z
o-Z

FF
N-

ro-
ro-

ro-
ro-

ero
N-
N-

ero
N-
Rib

Rib
Rib
DS

Ze
Ze

DS

Ze
DS

Ze

DS
oZ

oZ
qX

qX
qX
qX

ibo
ibo

ibo
ibo

qX
qX

Rib

qX

Rib
-Se

-Se
Se
-Se

XR
XR

XR
-Se
XR
Se

-Se
qX
N-

qX
NA

NA
NA

ero
N-

PE

PE
ero

NA

-Se
DS

NA

-Se
DS
mR

mR
mR

o-Z
FF

FF
o-Z

mR

mR
NA

NA
N-

N-

Rib
Rib

mR

mR
DS

DS
(C) (D)
mRNA-Seq vs. DSN-Seq mRNA-Seq vs. Ribo-Zero DSN-Seq vs. Ribo-Zero 1.1

1.0

DSN-Seq vs. DSN-FFPE Ribo-Zero vs. Ribo0-FFPE DSN-FFPE vs. Ribo0-FFPE 0.9

0.8

mRNA-Seq vs. DSN-FFPE mRNA-Seq vs. Ribo0-FFPE 0.7

# samples 10 11 4 8 4 8
PE
PE
q

PE
ero

PE
Se

FF
FF

-FF
o-Z

FF
N-

ro-
N-

N-
DS

ero
Rib

DS

Ze
DS
qX

oZ
qX

ibo
qX

qX
-Se

Rib
-Se

-Se

XR
Se
NA

qX
NA

ero
N-
NA

-Se
mR

mR

DS

o-Z
mR

NA

Rib
mR

Figure 3 Comparison of gene quantification concordance across RNA-Seq library protocols. Pearson correlation coefficients of RNA-Seq
libraries pairs in (A) UNC and (B) TCGA dataset. (C) Scatter plots of libraries of each pair of protocols for breast tumor sample 020578B. (D)
Deming regression slope for pairs of RNA-Seq libraries in UNC dataset. A slope of 1 indicates the equivalent sensitivity of the two libraries,
whereas a smaller value is indicative of a higher sensitivity of the first term/method in the pair.

indicates the equivalent sensitivity of the two libraries, exhibited its superiority over all the other protocols
whereas a smaller value is indicative of a higher sen- in terms of sensitivity, with a slope less than 1 in all
sitivity of the first protocol in the pair. mRNA-Seq the pair-wise comparison. In addition, DSN-Seq and
Zhao et al. BMC Genomics 2014, 15:419 Page 7 of 11
http://www.biomedcentral.com/1471-2164/15/419

Ribo-Zero-Seq both have higher sensitivity in FF sam- RNAs do not possess poly(A) tails, and therefore, are not
ples than in FFPE. targeted by poly(A) selection in mRNA-Seq. Conversely,
104 genes at a FDR of 0 were identified to be differentially
Gene expression patterns and differential gene expressed between Ribo-Zero-Seq and DSN-Seq libraries
expression (Additional file 5: Table S2C and D); among these, 38
Hierarchical clustering analysis provides a global exa- genes were lowly quantified by DSN-Seq and most of
mination whether biologically relevant expression sig- these genes were snoRNAs and histone RNAs, which tend
natures are consistently measured by distinct protocols. to exist at high abundance in total RNAs. Since DSN-Seq
In this example, we tested whether the same sample removes the most highly abundant components via CoT
assayed by different protocols “paired” or “partnered” to- kinetics, these RNAs may also be subject to depletion in
gether; if so, then this is a very high level of assay valid- the DSN protocol relative to the Ribo-Zero, which uses
ation as not only are the overall subtype expression beads to capture only the rRNAs.
profiles maintained, but also the profiles that are unique
to that sample are maintained. We performed hierar- Coverage of annotated genes at different sequencing
chical clustering analysis of the RNA-Seq data using a depths
previously published ‘intrinsic gene list’ [25] (Additional Compared to hybridization-based methods, the cost per
file 2: Figure S2) and a set of 904 human breast tumor sample by RNA-Seq is still higher. The utilization of
samples that consists of the 88 UNC and TCGA samples multiplexing techniques provides a strategy to further
described here and 725 additional breast tumors and 91 lower the costs. However, too much multiplexing will in-
normal breast tissues with mRNA-Seq from TCGA. 41/ hibit the ability to detect lowly expressed genes; there-
44 samples of the UNC tumor dataset were tightly co- fore, we sought to determine the minimal number of
clustered with their partner sample originating from the reads required to provide the same transcriptome co-
same tumor, and these clustered with other TCGA tu- verage as provided by an Agilent DNA microarray. The
mors based upon each tumor’s subtype profile. The 3/44 ENCODE Consortium guidelines and other studies have
non-clustered samples were all prepared by Ribo-Zero- provided insights into the sufficient RNA-Seq coverage
Seq on FFPE samples and their partner DSN-Seq sam- and depth for studies of various design goals [26], but
ples on FFPE were not available. In the TCGA dataset, these efforts were primarily focused on experiments with
40/44 samples were tightly co-clustered with their part- FF samples prepared by poly(A)-enrichment protocols.
ners (i.e. libraries constructed from the same tumor Here we extended the investigation to rRNA depletion
using a different sequencing protocol); the four samples approaches and FFPE samples.
that were not clustered were on a separate branch, but We applied a simulation-based method on the pooled
were moderately correlated with their partner samples data of each protocol. The UCSC known gene reference
(correlation > 0.6). database (GAF 2.1) includes 20,531 (non-ribosomal)
To further evaluate the capability of RNA-Seq proto- genes. To reduce the noise, we only counted genes as
cols to detect biologically relevant differential expression present if there were 3 or greater read counts. Using the
signals, we performed Significance Analysis of Micro- average number of genes detected on our Agilent micro-
array (SAM) on three Basal-like and three Luminal sam- arrays as the baseline (n = 16,975), 13.5 million reads
ples of UNC dataset that were sequenced by mRNA-seq, from FF mRNA-Seq libraries would allow detection of
Ribo-Zero-Seq and DSN-Seq respectively. Comparison the same number of genes (Figure 4), which is consistent
of the top 500 most variably expressed genes between with previous studies [26]. In the DSN-Seq and Ribo-
Basal-like and Luminal tumors revealed that each pair of Zero-Seq FF libraries, and Ribo-Zero-Seq-FFPE libraries,
RNA-Seq protocols shared about 350 differential ex- 35-65 M reads were required to provide the same tran-
pressed genes, and more than 300 genes were consist- scriptome coverage. Only the DSN-Seq-FFPE library re-
ently identified by all the protocols (Additional file 3: quired a much larger number of input reads (90 M).
Figure S3, Additional file 4: Table S1).
As another test of data quality, we determined the Discussion
differentially expressed gene set in FF mRNA-Seq vs. The growing popularity of RNA-Seq makes it one of the
Ribo-Zero-Seq and FF Ribo-Zero-Seq vs. DSN-Seq using more desired methods to explore the transcriptome. Pre-
Significance Analysis of Microarray (SAM). We iden- paring RNA-Seq libraries with poly(A) enrichment pro-
tified 410 genes with a FDR of 0 that were differen- vides an accurate method to characterize mRNAs, which
tially expressed between mRNA-Seq and Ribo-Zero-Seq is functionally equivalent to what DNA microarrays have
(Additional file 5: Table S2A and B); this list was enriched been accomplishing for more than a decade. However,
with snoRNAs and histone RNAs that were more highly certain biologically relevant RNA species that do not
expressed in the Ribo-Zero-Seq samples. Many of these possess poly(A) tails are largely undetected using a poly
Zhao et al. BMC Genomics 2014, 15:419 Page 8 of 11
http://www.biomedcentral.com/1471-2164/15/419

20,000

18,000
microarray
16,000
# genes detected

14,000

12,000

10,000

8,000

13.5 35 45 65 90

0 50 100 150 200 (million)

# sampling total reads

mRNA−Seq Ribo-Zero-Seq DSN-Seq Ribo-Zero-Seq-FFPE DSN-Seq-FFPE

Figure 4 Determination of the number of reads needed for each RNA-Seq protocol to equal a DNA microarray. The number of detected
genes at different levels of sequencing depth is displayed relative to the number of genes detected via DNA microarray (dashed horizontal line).

(A) selection protocol. In addition, FFPE samples, such between microarray and RNA-Seq in rRNA-depleted li-
as those collected as part of standard medical practice, braries, where RNA-zero-Seq and DSN-Seq were found to
also require library preparation methods that do not rely be highly consistent in gene quantification. Our evaluation
on the intact poly(A) structure due to the highly de- of the quantitative consistency of RNA-Seq on FFPE with
graded nature of the FFPE RNA. In this study, we de- microarray may be limited in two aspects: (a) the quality
monstrate that a Ribo-Zero-Seq protocol using either of a few UNC FFPE samples was less satisfactory, and (b)
fresh-frozen (FF) or FFPE samples eliminates rRNA with not all the tumors have RNA-Seq data on matched FFPE
good efficiency. In evaluation of a possible coverage bias, samples that passed our quality control available for this
5′-to- 3′ bias was reduced in FF Ribo-Zero-Seq libraries analysis. Yet we still observed very good correlations with
as it does not rely on poly(A) selection step. microarray data for those samples with complete FFPE
One major distinction across these various protocols is data, which gave correlation values nearly identical to
the coverage of the transcriptome. To more directly in- those seen when comparing an Agilent microarray versus
vestigate the relationship between sequencing depth and an Affymetrix microarray [23].
transcriptome coverage, we performed a simulation ap- Given the consistent quantification, mRNA-Seq and
proach where mRNA-Seq was the most cost effective rRNA depletion protocols exhibited their merits in dif-
strategy to equal a microarray in terms of total genes de- ferent aspects. In the set of genes detected by all the
tected with a minimum of ~13.5 million reads needed. protocols, mRNA-Seq provided the highest sensitivity in
For the same transcriptome coverage, the reads required detecting differentially expressed genes, which was likely
for Ribo-Zero-Seq in FF and FFPE and DSN-Seq in FF due to the greater fraction of reads mapping to the
were 35-65 M reads. However, rRNA depletion proto- transcriptome. On the other hand, Ribo-Zero-Seq de-
cols also appear to measure immature transcripts (pre- tected about 550 more annotated genes than mRNA-Seq
mRNAs) and therefore provide more information on (Additional file 6: Table S3). With a much greater set of
splicing patterns and possible splice junctions. Thus to reads mapping to the intergenic and intronic regions in
achieve the same level of exonic reads as FF mRNA-Seq, rRNA depletion protocols, the number of additional tran-
one needs to sequence 2–4 times the number of reads in scripts detected with the new protocols may be expected
rRNA-depletion on FFPE RNA libraries. to be greater than our conservative estimation here. As
Despite fewer of the total reads mapping to exonic re- shown in another recent study [26], we also expect more
gions and a greater number of transcripts being detected, novel transcripts to be identified from the rRNA depletion
we did not observe a marked decrease in the correlation methods.
Zhao et al. BMC Genomics 2014, 15:419 Page 9 of 11
http://www.biomedcentral.com/1471-2164/15/419

The very good quantification performance of the pro- was isolated using Roche High Pure RNA paraffin kit,
tocols on FFPE samples is of significant impact for re- Cat# 03270289001. The extent of RNA degradation was
searchers with clinical samples. Our results demonstrate assessed using a BioAnalyzer (Agilent).
that Ribo-Zero-Seq had high technical reproducibility
on FFPE RNAs and high concordance with FF RNAs.
Library construction and sequencing
Though the quantification of FFPE was less correlated
mRNA-Seq library: Illumina TruSeq™ RNA Sample Prep
to FF mRNA-Seq, the two rRNA depletion methods pro-
Kit (Cat# RS-122-2001) was used with 1ug of total RNA
vided highly consistent gene profiles on FFPE. Thus, it is
for the construction of libraries according to the ma-
the quality of FFPE RNA samples, rather than the robust-
nufacturer’s protocol. Ribo-Zero library: rRNA was re-
ness of method, that likely contributes more to the vari-
moved from FF or FFPE total RNA using Epicentre’s
ation of performance with respect to gene quantification.
Ribo-Zero rRNA Removal kit (Cat# RZH11042). For FF
The hierarchical clustering analysis also validated that the
samples, 30-100 ng Ribo-Zero RNA was used for the
biologically-based intrinsic gene profiles were present and
construction of the library using the Illumina TruSeq™
highly correlated between FF and FFPE. Hence, we suggest
RNA Sample Prep Kit (Cat# RS-122-2001) and followed
that it is possible to apply the rRNA depletion protocols
the manufacturer’s instruction, except for omitting the
to FFPE samples and achieve quantitative accuracies com-
purification step before fragmentation. For FFPE sam-
parable with standard genome profiling techniques that
ples, 30-100 ng Ribo-Zero RNA was then incubated with
use FF tissues and RNAs.
Random Primers (Invitrogen, Cat# 48190011) at 65°C
for 5 minutes then Illumina TruSeq™ RNA Sample Prep
Conclusions Kit (Cat# RS-122-2001) was used to construct the library
In this study, we demonstrated that compared to mRNA- according to the manufacturer’s protocol from the step
Seq, Ribo-Zero-Seq provides equivalent rRNA removal of First Strand cDNA Synthesis. DSN library: Illumina
efficiency, coverage uniformity, genome-based mapped TruSeq™ RNA Sample Prep Kit (Cat# RS-122-2001) was
reads, and reduces 5′- to- 3′ bias. In addition, both Ribo- used with 100 ng of total RNA for the construction of
Zero-Seq and DSN-Seq provide highly consistent quantifi- libraries following the manufacturer’s protocol, except
cation of transcripts when compared to microarrays or for omitting the purification of mRNA step in FF sam-
mRNA-Seq, and substantially more information on non- ples, and the purification and fragmentation step in FFPE
poly(A) RNA. Moreover, the two rRNA depletion methods samples. The total RNA libraries went through DSN
have consistent transcript quantification using FFPE treatment and PCR enrichment according to Illumina
RNAs and show high reproducibility. DSN Normalization Sample Preparation Guide (http://
supportres.illumina.com/documents/myillumina/7836bd3e-
3358-4834-b2f7-80f80acb4e3f/dsn_normalization_sample-
Methods
prep_application_note_15014673_c.pdf). Sequencing: All
RNA samples
cDNA libraries were sequenced using an Illumina
We constructed RNA-Seq libraries using eleven UNC
HiSeq2000, producing 48x7x48 bp paired-end reads
breast tumor samples using different sample preparation
with multiplexing.
protocols including: (a) FF RNA samples by mRNA-Seq,
Ribo-Zero-Seq and DSN-Seq and (b) FFPE samples by
Ribo-Zero-Seq and DSN-Seq (Figure 1B). One of the Read processing and alignment
FF-DSN samples, 3 of the FFPE-Ribo-Zero samples, and 7 All samples were processed and filtered as described in
of the FFPE-DSN samples failed sequencing QC (i.e. too The Cancer Genome Atlas [27]. Bases and QC assess-
few reads) and were not included in the study. To aug- ment of sequencing were generated by CASAVA 1.8.
ment the UNC sample set, we also tested an additional QC-passed reads were aligned to the NCBI build 37
sample set of FF and FFPE samples collected as part of the (hg19) human reference genome using MapSplice v12_07
TCGA project, where total RNA of ten tumors, including [9]. The alignment profile was determined by Picard Tools
6 breast tumors and 4 prostate tumors, were prepared in v1.64 (http://picard.sourceforge.net/). The aligned reads
three ways: (a) FF samples with mRNA-Seq, (b) FFPE with were sorted and indexed using SAMtools, and then trans-
Ribo-Zero-Seq and 8 technical replicates, and (c) FFPE lated to transcriptome coordinates and filtered for indels,
with DSN-Seq. In addition, we prepared FF samples for 6 large inserts, and zero mapping quality using UBU v1.0
of the 10 TCGA tumors with Ribo-Zero-Seq protocol (https://github.com/mozack/ubu). For the reference tran-
(Figure 1B). All library construction and sequencing were scriptome, UCSC hg19 GAF2.1 for KnownGenes [28] was
performed at UNC for both the UNC and TCGA samples. used, with genes located on non-standard chromosomes
For fresh-frozen tissues, we isolated total RNA with removed. The abundance of transcripts was then esti-
Qiagen RNeasy mini kit. For FFPE samples, total RNA mated using an Expectation-Maximization algorithm
Zhao et al. BMC Genomics 2014, 15:419 Page 10 of 11
http://www.biomedcentral.com/1471-2164/15/419

implemented in the software package RSEM [29] v1.1.13. Additional files


Estimated counts were transformed by upper quartile
normalization prior to comparison of expression across Additional file 1: Figure S1. Visual display of the reads aligning to
GATA3. (A) Read pile-up plots of GATA3 in Sample 020578B showing data
protocols. for five different RNA-Seq libraries. (B) Close-up of the read mapping
identifying reads that span exon-intron boundaries, which identify
unspliced mRNA species.
Identification of RNA-Seq library complexity and random Additional file 2: Figure S2. Intrinsic gene set clustering analysis.
sampling Hierarchical cluster using a breast cancer intrinsic gene set (~2000 genes)
The RNA-Seq data was filtered by requiring the gross and 88 breast tumor samples prepared using the multiple protocols, with
an additional 816 samples from the TCGA Breast Cancer Project (725
RSEM count to be ≥3 for each gene. For each proto- tumors and 91 normal tissues). The rows above the heat map identify the
col, the detected gene sets were defined as genes that 88 samples from this study, their RNA-Seq protocol type, and the red
were reported in >70% tumor lanes and with 3 or more arrows show the location of the few mismatched samples.
reads. To determine the amount of input reads needed Additional file 3: Figure S3. Comparison of the top 500 differentially
expressed genes between Basal and Luminal tumors detected by
for sufficient transcriptome coverage, a simulation test mRNA-Seq, Ribo-Zero-Seq and DSN-Seq.
was performed on the UNC data. A series of fixed num- Additional file 4: Table S1. Comparison of the top 500 differentially
ber of reads were randomly selected from each protocol expressed genes between Basal and Luminal tumors detected by mRNA-
in a drawing without replacement method. For all the Seq, Ribo-Zero-Seq and DSN-Seq. The list of 307 differentially expressed
genes that are identified SAM analysis in all the three protocols.
resampling levels, the simulated data followed the same
Additional file 5: Table S2. Differentially expressed gene list across
alignment and filtering pipeline as described above. RNA-Seq protocols obtained from Significance Analysis of Microarray.
Gene sets detected were then identified for all the various (A, B) SAM analysis comparison of mRNA-Seq versus Ribo-Zero-Seq
levels. using an FDR = 0. (A) Uniquely expressed genes in Ribo-Zero-Seq.
(B) Lowly expressed genes in Ribo-Zero-Seq. (C, D) SAM analysis
comparison of Ribo-Zero-Seq versus DSM-Seq using an FDR = 0. (C)
Uniquely expressed genes in DSN-Seq. (D) Lowly expressed genes in
Gene expression comparison methods DSN-Seq.
For all the FF tumors and the Common Reference Sam- Additional file 6: Table S3. Comparison of genes detected by
ple, Agilent 244,000 feature whole genome microarrays mRNA-Seq and Ribo-Zero-Seq in FF samples. (A) Genes detected only by
were hybridized with tumor RNAs (Cy5) and a human mRNA-Seq and (B) genes detected only by Ribo-Zero-Seq.
common reference (Cy3) and lowess normalized as de-
scribed in Herschkowitz et al. [30]. In the RNA-Seq data,
the detected gene sets were identified as above (i.e. 3 or Competing interests
The authors declare that they have no competing interests.
more reads in >70% of samples). The log2 ratio of RNA-
Seq tumor samples to RNA-Seq human Common Re-
ference Sample (which was the same RNA used for the Authors’ contributions
2-color microarrays) was determined. Pearson correl- WZ, DNH, and CMP designed the study. WZ, KAH and CMP drafted the
paper. WZ performed the data analysis. XH carried out the library preparation
ation was determined and a Student’s t-test was applied and the RNA sequencing. KAH participated in the study design and
to evaluate the difference of RNA-Seq protocols in their coordination. KAH, DNH, and JSP contributed to bioinformatics analysis.
consistency to microarray. All authors read and approved the final manuscript.
The RNA-Seq gene quantification data was next fil-
tered by gene counts as above. The log2 transformed Acknowledgements
abundance of tumor samples was reported and was used We thank Gary Schroth, Shujun Luo and Illumina for sharing with us their
to derive the correlation between RNA-Seq protocol DSN protocol and for helpful comments. Some of this work was conducted
as part of TCGA, a project of the National.
pairs. Using R package MethComp, Deming regression Cancer Institute and the National Human Genome Research Institute; we
was applied to compare the sensitivity in detecting dif- thank the TCGA Network for discussions and contributions to this
ferentially expressed genes. An unpaired two-class SAM manuscript. This work was supported by funds from NIH U24-CA143848,
SAIC contract # X13-1092, and the NCI Breast SPORE (P50-CA58223-09A1).
analysis was used to identify genes that have differential
expression level in a) mRNA-Seq versus Ribo-Zero-Seq, Author details
1
and b) Ribo-Zero-Seq versus DSN-Seq. Curriculum in Bioinformatics and Computational Biology, The University of
North Carolina at Chapel Hill, Chapel Hill NC 27599, USA. 2Department of
Gene expression quantification by microarray and Genetics, The University of North Carolina at Chapel Hill, Chapel Hill NC
RNA-Seq for all samples new to this manuscript can 27599, USA. 3Lineberger Comprehensive Cancer Center, The University of
be found in GEO database under accession GSE51783. North Carolina at Chapel Hill, Chapel Hill NC 27599, USA. 4Department of
Pathology & Laboratory Medicine, The University of North Carolina at Chapel
Aligned BAM files are available at dbGaP under the series Hill, Chapel Hill NC 27599, USA. 5Department of Internal Medicine, Division of
ID of phs000676.v1.p1. TCGA sample RNA-Seq data is Medical Oncology, The University of North Carolina at Chapel Hill, Chapel Hill
available at cgHub (BAM files, https://cghub.ucsc.edu/) NC 27599, USA.
and DCC (expression level data, https://tcga-data.nci.nih. Received: 24 February 2014 Accepted: 30 May 2014
gov/tcga/). Published: 2 June 2014
Zhao et al. BMC Genomics 2014, 15:419 Page 11 of 11
http://www.biomedcentral.com/1471-2164/15/419

References 21. Oshlack A, Robinson MD, Young MD: From RNA-seq reads to differential
1. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: an expression results. Genome Biol 2010, 11:220.
assessment of technical reproducibility and comparison with gene 22. The Cancer Genome Atlas Network: Comprehensive molecular portraits of
expression arrays. Genome Res 2008, 18:1509–1517. human breast tumours. Nature 2012, 490:61–70.
2. Guo Y, Sheng Q, Li J, Ye F, Samuels DC, Shyr Y: Large scale comparison of 23. Verhaak RGW, Hoadley KA, Purdom E, Wang V, Qi Y, Wilkerson MD, Miller
gene expression levels by microarrays and RNAseq using TCGA data. CR, Ding L, Golub T, Mesirov JP, Alexe G, Lawrence M, O’Kelly M, Tamayo P,
PLoS One 2013, 8:e71462. Weir BA, Gabriel S, Winckler W, Gupta S, Jakkula L, Feiler HS, Hodgson JG,
3. The Cancer Genome Atlas Network: Integrated genomic characterization James CD, Sarkaria JN, Brennan C, Kahn A, Spellman PT, Wilson RK, Speed TP,
of endometrial carcinoma. Nature 2013, 497:67–73. Gray JW, Meyerson M, et al: Integrated genomic analysis identifies clinically
4. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and relevant subtypes of glioblastoma characterized by abnormalities in
quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008, PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 2010, 17:98–110.
5:621–628. 24. Norton N, Sun Z, Asmann YW, Serie DJ, Necela BM, Bhagwate A, Jen J,
5. Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Eckloff BW, Kalari KR, Thompson KJ, Carr JM, Kachergus JM, Geiger XJ, Perez
Lagarde J, Lin W, Schlesinger F, Xue C, Marinov GK, Khatun J, Williams BA, EA, Thompson EA: Gene expression, single nucleotide variant and fusion
Zaleski C, Rozowsky J, Röder M, Kokocinski F, Abdelhamid RF, Alioto T, transcript discovery in archival material from breast tumors. PLoS One
Antoshechkin I, Baer MT, Bar NS, Batut P, Bell K, Bell I, Chakrabortty S, 2013, 8:e81925.
Chen X, Chrast J, Curado J, et al: Landscape of transcription in human 25. Parker JS, Mullins M, Cheang MCU, Leung S, Voduc D, Vickery T, Davies S,
cells. Nature 2012, 489:101–108. Fauron C, He X, Hu Z, Quackenbush JF, Stijleman IJ, Palazzo J, Marron JS,
6. Quinn EM, Cormican P, Kenny EM, Hill M, Anney R, Gill M, Corvin AP, Nobel AB, Mardis E, Nielsen TO, Ellis MJ, Perou CM, Bernard PS: Supervised
Morris DW: Development of strategies for SNP detection in RNA-seq data: risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol
application to lymphoblastoid cell lines and evaluation using 1000 2009, 27:1160–1167.
Genomes data. PLoS One 2013, 8:e58815. 26. Wang Y, Ghaffari N, Johnson CD, Braga-Neto UM, Wang H, Chen R, Zhou H:
7. Piskol R, Ramaswami G, Li JB: Reliable identification of genomic variants Evaluation of the coverage and depth of transcriptome by RNA-Seq in
from RNA-Seq data. Am J Hum Genet 2013, 93:641–651. chickens. BMC Bioinformatics 2011, 12(Suppl 10):S5.
8. Chao H-H, He X, Parker JS, Zhao W, Perou CM: Micro-scale genomic DNA 27. The Cancer Genome Atlas Network: Comprehensive genomic
copy number aberrations as another means of mutagenesis in breast characterization of squamous cell lung cancers. Nature 2012, 489:519–525.
cancer. PLoS One 2012, 7:e51719. 28. Karolchik D, Barber GP, Casper J, Clawson H, Cline MS, Diekhans M,
9. Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, He X, Dreszer TR, Fujita P a, Guruvadoo L, Haeussler M, Harte R a, Heitner S,
Mieczkowski P, Grimm SA, Perou CM, MacLeod JN, Chiang DY, Prins JF, Hinrichs AS, Learned K, Lee BT, Li CH, Raney BJ, Rhead B, Rosenbloom KR,
Liu J: MapSplice: accurate mapping of RNA-seq reads for splice junction Sloan C a, Speir ML, Zweig AS, Haussler D, Kuhn RM, Kent WJ: The UCSC
discovery. Nucleic Acids Res 2010, 38:e178. genome browser database: 2014 update. Nucleic Acids Res 2014,
10. Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert 42(Database issue):D764–D770.
M, Borodina T, Soldatov A, Parkhomchuk D, Schmidt D, O’Keeffe S, Haas S, 29. Li B, Dewey CN: RSEM: accurate transcript quantification from RNA-Seq
Vingron M, Lehrach H, Yaspo M-L: A global view of gene activity and data with or without a reference genome. BMC Bioinformatics 2011,
alternative splicing by deep sequencing of the human transcriptome. 12:323.
Science 2008, 321:956–960. 30. Herschkowitz JI, Zhao W, Zhang M, Usary J, Murrow G, Edwards D, Knezevic
11. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ: Deep surveying of alternative J, Greene SB, Darr D, Troester MA, Hilsenbeck SG, Medina D, Perou CM,
splicing complexity in the human transcriptome by high-throughput Rosen JM: Comparative oncogenomics identifies breast tumors
sequencing. Nat Genet 2008, 40:1413–1415. enriched in functional tumor-initiating cells. Proc Natl Acad Sci 2012,
12. Esteller M: Non-coding RNAs in human disease. Nat Rev Genet 2011, 109(8):2778–2783.
12:861–874.
13. Fatica A, Bozzoni I: Long non-coding RNAs: new players in cell doi:10.1186/1471-2164-15-419
differentiation and development. Nat Rev Genet 2013, 15:7–21. Cite this article as: Zhao et al.: Comparison of RNA-Seq by poly (A)
capture, ribosomal RNA depletion, and DNA microarray for expression
14. Du Z, Fei T, Verhaak RGW, Su Z, Zhang Y, Brown M, Chen Y, Liu XS:
profiling. BMC Genomics 2014 15:419.
Integrative genomic analyses reveal clinically relevant long noncoding
RNAs in human cancer. Nat Struct Mol Biol 2013, 20:908–913.
15. Akrami R, Jacobsen A, Hoell J, Schultz N, Sander C, Larsson E:
Comprehensive analysis of long non-coding RNAs in ovarian cancer
reveals global patterns and targeted DNA amplification. PLoS One 2013,
8:e80306.
16. Mullins M, Perreard L, Quackenbush JF, Gauthier N, Bayer S, Ellis M, Parker J,
Perou CM, Szabo A, Bernard PS: Agreement in breast cancer classification
between microarray and quantitative reverse transcription PCR from
fresh-frozen and formalin-fixed, paraffin-embedded tissues. Clin Chem
2007, 53:1273–1279.
17. Zhulidov PA, Bogdanova EA, Shcheglov AS, Vagner LL, Khaspekov GL,
Kozhemyako VB, Matz MV, Meleshkevitch E, Moroz LL, Lukyanov SA, Shagin
DA: Simple cDNA normalization using kamchatka crab duplex-specific Submit your next manuscript to BioMed Central
nuclease. Nucleic Acids Res 2004, 32:e37. and take full advantage of:
18. O’Neil D, Glowatz H, Schlumpberger M: Ribosomal RNA depletion for
efficient use of RNA-seq capacity. Curr Protoc Mol Biol 2013,
• Convenient online submission
Chapter 4:Unit 4.19.
19. Adiconis X, Borges-Rivera D, Satija R, DeLuca DS, Busby M a, Berlin AM, • Thorough peer review
Sivachenko A, Thompson DA, Wysoker A, Fennell T, Gnirke A, Pochet N, • No space constraints or color figure charges
Regev A, Levin JZ: Comparative analysis of RNA sequencing methods for
degraded or low-input samples. Nat Methods 2013, 10:623–629. • Immediate publication on acceptance
20. ‘t Hoen PAC, Ariyurek Y, Thygesen HH, Vreugdenhil E, Vossen RHAM, • Inclusion in PubMed, CAS, Scopus and Google Scholar
de Menezes RX, Boer JM, van Ommen G-JB, den Dunnen JT: Deep • Research which is freely available for redistribution
sequencing-based expression analysis shows major advances in
robustness, resolution and inter-lab portability over five microarray
platforms. Nucleic Acids Res 2008, 36:e141. Submit your manuscript at
www.biomedcentral.com/submit

You might also like