Gene Expression Analysis 1st Nalini Raghavachari
Natlia Garciareyero download
https://ebookbell.com/product/gene-expression-analysis-1st-
nalini-raghavachari-natlia-garciareyero-47734432
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Gene Expression Data Analysis A Statistical And Machine Learning
Perspective 1st Edition Pankaj Barah
https://ebookbell.com/product/gene-expression-data-analysis-a-
statistical-and-machine-learning-perspective-1st-edition-pankaj-
barah-38401374
Microarray Gene Expression Data Analysis 1st Edition Helen Causton
John Quackenbush Alvis Brazma
https://ebookbell.com/product/microarray-gene-expression-data-
analysis-1st-edition-helen-causton-john-quackenbush-alvis-
brazma-2143410
Capanalysis Gene Expression Cage The Science Of Decoding Genes
Transcription Piero Carninci
https://ebookbell.com/product/capanalysis-gene-expression-cage-the-
science-of-decoding-genes-transcription-piero-carninci-4421464
Capanalysis Gene Expression Cage The Science Of Decoding Genes
Transcription Carninci
https://ebookbell.com/product/capanalysis-gene-expression-cage-the-
science-of-decoding-genes-transcription-carninci-10502424
The Analysis Of Gene Expression Data Methods And Software 1st Edition
Giovanni Parmigiani
https://ebookbell.com/product/the-analysis-of-gene-expression-data-
methods-and-software-1st-edition-giovanni-parmigiani-4200694
Serial Analysis Of Gene Expression Sage Methods And Protocols 1st
Edition Annabeth Laursen Hgh
https://ebookbell.com/product/serial-analysis-of-gene-expression-sage-
methods-and-protocols-1st-edition-annabeth-laursen-hgh-4287954
Bayesian Analysis Of Gene Expression Data Baladandayuthapani
https://ebookbell.com/product/bayesian-analysis-of-gene-expression-
data-baladandayuthapani-10819864
Advanced Analysis Of Gene Expression Microarray Data 1st Edition
Aidong Zhang
https://ebookbell.com/product/advanced-analysis-of-gene-expression-
microarray-data-1st-edition-aidong-zhang-1354906
Statistical Analysis Of Gene Expression Microarray Data 1st Edition
Terry Speed
https://ebookbell.com/product/statistical-analysis-of-gene-expression-
microarray-data-1st-edition-terry-speed-874624
Methods in
Molecular Biology 1783
Nalini Raghavachari
Natàlia Garcia-Reyero Editors
Gene
Expression
Analysis
Methods and Protocols
METHODS IN MOLECULAR BIOLOGY
Series Editor
John M. Walker
School of Life and Medical Sciences
University of Hertfordshire
Hatfield, Hertfordshire, AL10 9AB, UK
For further volumes:
http://www.springer.com/series/7651
Gene Expression Analysis
Methods and Protocols
Edited by
Nalini Raghavachari
Division of Geriatrics and Clinical Gerontology, National Institute on Aging, Bethesda, MD, USA
Natàlia Garcia-Reyero
Environmental Laboratory, US Army Engineer Research and Development Center, Vicksburg, MS, USA
Editors
Nalini Raghavachari Natàlia Garcia-Reyero
Division of Geriatrics Environmental Laboratory
and Clinical Gerontology US Army Engineer Research and Development Center
National Institute on Aging Vicksburg, MS, USA
Bethesda, MD, USA
ISSN 1064-3745 ISSN 1940-6029 (electronic)
Methods in Molecular Biology
ISBN 978-1-4939-7833-5 ISBN 978-1-4939-7834-2 (eBook)
https://doi.org/10.1007/978-1-4939-7834-2
Library of Congress Control Number: 2018941461
© Springer Science+Business Media, LLC, part of Springer Nature 2018
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction
on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations
and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to
be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty,
express or implied, with respect to the material contained herein or for any errors or omissions that may have been made.
The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Printed on acid-free paper
This Humana Press imprint is published by the registered company Springer Science+Business Media, LLC part of
Springer Nature.
The registered company address is: 233 Spring Street, New York, NY 10013, U.S.A.
Preface
Rapid advances in genomic technologies and computational tools are revolutionizing the
field of transcriptomics, which is the study of the transcriptome or the genes transcribed
from genomic DNA. The applications of these techniques are endless. From understanding
development and disease to environmental monitoring, or community analysis, the pub-
lished manuscripts using gene expression to tackle different questions have not stopped and
increasing exponentially over the last decades. The methods and techniques are getting
more accurate and the computational tools more precise. We can now understand what
happens at the single-cell level, which was unthinkable just a few years ago. It is an exciting
era with fast advances and new applications that help us understand life.
This volume of the Methods in Molecular Biology series presents a collection of chapters
for both experimental and bioinformatics approaches related to different aspects of gene
expression analysis. Each chapter begins with an introduction or background of the protocol
or technology being described. When appropriate, the Materials section lists all the reagents
and other materials needed. A detailed step-by-step description of the protocol used is
provided, with the goal of communicating all the practical steps necessary to successfully
perform the protocol. In order to provide more useful information, a Notes section is also
supplied, with tips and suggestions to best perform the protocol or overcome problems that
might arise. In addition to the regular chapters, we have a few review chapters that talk about
the current and future trends in gene expression analyses both in wetlab and data analysis.
This particular book addresses protocols and techniques related to gene expression. It is
divided in three main sections: background chapters, wetlab protocols, and bioinformatics
approaches. We have assembled an exciting array of chapters tackling cutting-edge techni-
ques, such as single-cell gene expression, highly multiplexed amplicon sequencing, multi-
omics techniques, targeted sequencing, or epigenetics. Gene expression analysis is broadly
used worldwide and has an immense array of applications. Furthermore, the techniques and
methods used are rapidly changing. They are also becoming increasingly precise, informa-
tive, often challenging, and applicable to many different fields. That is why we hope that
these detailed chapters will provide useful information to researchers worldwide. We greatly
appreciate all the authors’ contributions; without their efforts and expertise this book would
not have been possible. We hope that this volume will be a valuable part of many labora-
tories, hopefully getting old on the bench rather than staying flawless on the library shelf.
Bethesda, MD, USA Nalini Raghavachari
Vicksburg, MS, USA Natàlia Garcia-Reyero
v
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1 Overview of Gene Expression Analysis: Transcriptomics . . . . . . . . . . . . . . . . . . . . . 1
Nalini Raghavachari and Natàlia Garcia-Reyero
2 RNA-Seq and Expression Arrays: Selection Guidelines
for Genome-Wide Expression Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Jessica Minnier, Nathan D. Pennock, Quichen Guo, Pepper Schedin,
and Christina A. Harrington
3 A Guide for Designing and Analyzing RNA-Seq Data . . . . . . . . . . . . . . . . . . . . . . . 35
Aniruddha Chatterjee, Antonio Ahn, Euan J. Rodger, Peter A. Stockwell,
and Michael R. Eccles
4 SureSelectXT RNA Direct: A Technique for Expression Analysis
Through Sequencing of Target-Enriched FFPE Total RNA . . . . . . . . . . . . . . . . . . 81
Jennifer Carter Jones, Alex P. Siebold, Carolina Becker Livi,
and Anne Bergstrom Lucas
5 Simultaneous, Multiplexed Detection of RNA and Protein
on the NanoString® nCounter® Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Sarah Warren
6 Transcript Profiling Using Long-Read Sequencing Technologies . . . . . . . . . . . . . 121
Anthony Bayega, Yu Chang Wang, Spyros Oikonomopoulos,
Haig Djambazian, Somayyeh Fahiminiya, and Jiannis Ragoussis
7 Making and Sequencing Heavily Multiplexed, High-Throughput
16S Ribosomal RNA Gene Amplicon Libraries Using a Flexible,
Two-Stage PCR Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Ankur Naqib, Silvana Poggi, Weihua Wang, Marieta Hyde,
Kevin Kunstman, and Stefan J. Green
8 MicroRNA Expression Analysis: Next-Generation Sequencing. . . . . . . . . . . . . . . . 171
Poching Liu
9 Identification of Transcriptional Regulators That Bind to Long
Noncoding RNAs by RNA Pull-Down and RNA Immunoprecipitation . . . . . . . 185
Xiangbo Ruan, Ping Li, and Haiming Cao
10 Single-Cell mRNA-Seq Using the Fluidigm C1 System
and Integrated Fluidics Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Haibiao Gong, Devin Do, and Ramesh Ramakrishnan
11 Current and Future Methods for mRNA Analysis: A Drive
Toward Single Molecule Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Anthony Bayega, Somayyeh Fahiminiya, Spyros Oikonomopoulos,
and Jiannis Ragoussis
vii
viii Contents
12 Expression Profiling of Differentially Regulated Genes
in Fanconi Anemia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Binita Zipporah E, Kavitha Govarthanan, Pavithra Shyamsunder,
and Rama S. Verma
13 A Review of Transcriptome Analysis in Pulmonary Vascular Diseases . . . . . . . . . . 259
Dustin R. Fraidenburg and Roberto F. Machado
14 Differential Gene Expression Analysis of Plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Mark Arick II and Chuan-Yu Hsu
15 High Throughput Sequencing-Based Approaches for Gene
Expression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
R. Raja Sekhara Reddy and M. V. Ramanujam
16 Network Analysis of Gene Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Roby Joehanes
17 Analysis of ChIP-Seq and RNA-Seq Data with BioWardrobe . . . . . . . . . . . . . . . . . 343
Sushmitha Vallabh, Andrey V. Kartashov, and Artem Barski
18 Bayesian Network to Infer Drug-Induced Apoptosis Circuits
from Connectivity Map Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
Jiyang Yu and Jose M. Silva
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Contributors
ANTONIO AHN Department of Pathology, Dunedin School of Medicine, University of Otago,
Dunedin, New Zealand
MARK ARICK II Institute for Genomics, Biocomputing & Biotechnology, Mississippi State
University, Mississippi State, MS, USA
ARTEM BARSKI Division of Allergy and Immunology, Cincinnati Children’s Hospital
Medical Center, Cincinnati, OH, USA; Division of Human Genetics, Cincinnati
Children’s Hospital Medical Center, Cincinnati, OH, USA; Department of Pediatrics,
University of Cincinnati College of Medicine, Cincinnati, OH, USA
ANTHONY BAYEGA Department of Human Genetics, McGill University and Genome Quebec
Innovation Centre, McGill University, Montréal, QC, Canada
HAIMING CAO Cardiovascular Branch, National Heart, Lung and Blood Institute, NIH,
Bethesda, MD, USA
ANIRUDDHA CHATTERJEE Department of Pathology, Dunedin School of Medicine, University
of Otago, Dunedin, New Zealand; Maurice Wilkins Centre for Molecular Biodiscovery,
Auckland, New Zealand
HAIG DJAMBAZIAN Department of Human Genetics, McGill University and Genome Quebec
Innovation Centre, McGill University, Montréal, QC, Canada
DEVIN DO Fluidigm Corporation, South San Francisco, CA, USA
MICHAEL R. ECCLES Department of Pathology, Dunedin School of Medicine, University of
Otago, Dunedin, New Zealand; Maurice Wilkins Centre for Molecular Biodiscovery,
Auckland, New Zealand
SOMAYYEH FAHIMINIYA Department of Human Genetics, McGill University and Genome
Quebec Innovation Centre, McGill University, Montréal, QC, Canada; Cancer Research
Program, The Research Institute of the McGill University Health Centre, Montreal, QC,
Canada
DUSTIN R. FRAIDENBURG Division of Pulmonary, Critical Care, Sleep and Allergy,
Department of Medicine, University of Illinois at Chicago, Chicago, IL, USA
NATÀLIA GARCIA-REYERO Environmental Laboratory, US Army Engineer Research and
Development Center, Vicksburg, MS, USA
HAIBIAO GONG Fluidigm Corporation, South San Francisco, CA, USA
KAVITHA GOVARTHANAN Department of Biotechnology, Bhupat and Jyoti Mehta School of
Biosciences, Stem Cell and Molecular Biology Lab, Indian Institute of Technology Madras,
Chennai, India
STEFAN J. GREEN DNA Services Facility, Research Resources Center, University of Illinois at
Chicago, Chicago, IL, USA
QUICHEN GUO Department of Cell, Developmental and Cancer Biology, Oregon Health
and Science University, Portland, OR, USA
CHRISTINA A. HARRINGTON Department of Molecular and Medical Genetics, Integrated
Genomics Laboratory, Oregon Health and Science University, Portland, OR, USA
CHUAN-YU HSU Institute for Genomics, Biocomputing & Biotechnology, Mississippi State
University, Mississippi State, MS, USA
MARIETA HYDE DNA Services Facility, Research Resources Center, University of Illinois at
Chicago, Chicago, IL, USA
ix
x Contributors
ROBY JOEHANES Hebrew SeniorLife, Beth Israel Deaconess Medical Center, Harvard
Medical School, Boston, MA, USA
JENNIFER CARTER JONES Agilent Technologies, Santa Clara, CA, USA
ANDREY V. KARTASHOV Division of Allergy and Immunology, Cincinnati Children’s
Hospital Medical Center, Cincinnati, OH, USA
KEVIN KUNSTMAN DNA Services Facility, Research Resources Center, University of Illinois
at Chicago, Chicago, IL, USA
PING LI Cardiovascular Branch, National Heart, Lung and Blood Institute, NIH,
Bethesda, MD, USA
POCHING LIU DNA Sequencing and Genomics Core—NHLBI, National Institute
of Health, Bethesda, MD, USA
CAROLINA BECKER LIVI Agilent Technologies, Santa Clara, CA, USA
ANNE BERGSTROM LUCAS Agilent Technologies, Santa Clara, CA, USA
ROBERTO F. MACHADO Department of Medicine, Division of Pulmonary, Critical Care,
Sleep and Allergy, University of Illinois at Chicago, Chicago, IL, USA; Department
of Medicine, Division of Pulmonary, Critical Care, Sleep, and Occupational Medicine,
Indiana University School of Medicine, Indianapolis, IN, USA
JESSICA MINNIER School of Public Health, Oregon Health and Science University, Portland,
OR, USA
ANKUR NAQIB DNA Services Facility, Research Resources Center, University of Illinois
at Chicago, Chicago, IL, USA
SPYROS OIKONOMOPOULOS Department of Human Genetics, McGill University and
Genome Quebec Innovation Centre, McGill University, Montréal, QC, Canada
NATHAN D. PENNOCK Department of Cell, Developmental and Cancer Biology, Oregon
Health and Science University, Portland, OR, USA
SILVANA POGGI DNA Services Facility, Research Resources Center, University of Illinois
at Chicago, Chicago, IL, USA
NALINI RAGHAVACHARI Division of Geriatrics and Clinical Gerontology, National Institute
on Aging, Bethesda, MD, USA
JIANNIS RAGOUSSIS Department of Human Genetics, McGill University and Genome Quebec
Innovation Centre, McGill University, Montréal, QC, Canada; Department of
Bioengineering, McGill University, Montréal, QC, Canada; Department of Biochemistry,
Center of Innovation in Personalized Medicine, Cancer and Mutagen Unit, King Fahd
Center for Medical Research, King Abdulaziz University, Jeddah, Saudi Arabia
R. RAJA SEKHARA REDDY Clevergene Biocorp Private Limited, Bangalore, Karnataka,
India
RAMESH RAMAKRISHNAN Fluidigm Corporation, South San Francisco, CA, USA; Dovetail
Genomics LLC, Santa Cruz, CA, USA
M. V. RAMANUJAM Clevergene Biocorp Private Limited, Bangalore, Karnataka, India
EUAN J. RODGER Department of Pathology, Dunedin School of Medicine, University of
Otago, Dunedin, New Zealand; Maurice Wilkins Centre for Molecular Biodiscovery,
Auckland, New Zealand
XIANGBO RUAN Cardiovascular Branch, National Heart, Lung and Blood Institute, NIH,
Bethesda, MD, USA
PEPPER SCHEDIN Department of Cell, Developmental and Cancer Biology, Knight Cancer
Institute, Oregon Health and Science University, Portland, OR, USA; Young Women’s
Breast Cancer Translational Program, University of Colorado Anschutz Medical Campus,
Aurora, CO, USA
Contributors xi
PAVITHRA SHYAMSUNDER Cancer Science Institute, NUS, Singapore, Singapore
ALEX P. SIEBOLD Agilent Technologies, Santa Clara, CA, USA
JOSE M. SILVA Department of Pathology, Icahn School of Medicine at Mount Sinai,
The Mount Sinai Hospital, New York, NY, USA
PETER A. STOCKWELL Department of Biochemistry, University of Otago, Dunedin, New
Zealand
SUSHMITHA VALLABH Division of Allergy and Immunology, Cincinnati Children’s Hospital
Medical Center, Cincinnati, OH, USA
RAMA S. VERMA Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences,
Stem Cell and Molecular Biology Lab, Indian Institute of Technology Madras, Chennai,
India
WEIHUA WANG DNA Services Facility, Research Resources Center, University of Illinois
at Chicago, Chicago, IL, USA
YU CHANG WANG Department of Human Genetics, McGill University and Genome
Quebec Innovation Centre, McGill University, Montréal, QC, Canada
SARAH WARREN NanoString Technologies, Seattle, WA, USA
JIYANG YU Department of Computational Biology, St. Jude Children’s Research Hospital,
Memphis, TN, USA
BINITA ZIPPORAH E Department of Biotechnology, Bhupat and Jyoti Mehta School of
Biosciences, Stem Cell and Molecular Biology Lab, Indian Institute of Technology Madras,
Chennai, India
Chapter 1
Overview of Gene Expression Analysis: Transcriptomics
Nalini Raghavachari and Natàlia Garcia-Reyero
Abstract
Currently, the study of the transcriptome is widely used to interpret the functional elements of the genome
and molecular constituents of cells and tissues in an effort to unravel biological pathways associated with
development and disease. The advent of technologies is now enabling the study of such comprehensive
transcriptional characterization of mRNA, miRNA, lncRNA, and small RNA in a robust and successful
manner. Transcriptomic strategies are gaining momentum across diverse areas of biological, plant sciences,
medical, clinical, and pharmaceutical research for biomarker discovery, and disease diagnosis and prognosis.
Key words Transcriptomics, mRNA, Noncoding RNA, miRNA, QPCR, RNA-seq, Epigenetics
1 Introduction
The biological activities of cells, tissues, and organisms are based on
the central dogma of molecular biology. The “central dogma of
molecular biology,” as defined by Francis Crick, states that the
blueprint of life is digitally preserved within DNA and that this
information is sequentially transcribed via messenger RNA and
ultimately translated into protein within a biological system [1]
(Fig. 1). Rapid advancements in biochemical assays, genomic
tools and technologies, and high computing power have revolutio-
nized the ability to interrogate DNA, RNA, and protein at unprec-
edented resolution, depth, and thoroughness rapidly and at a global
scale, leading to the creation of the fields of genomics, transcrip-
tomics, and proteomics. Transcriptomics, which is the study of the
transcriptome or the genes that are transcribed from the genomic
DNA reveals the complete set of transcripts in a cell, and their
quantity, for a specific developmental stage or physiological condi-
tion [2]. Since the transcriptome reflects the state of gene expres-
sion under a given condition, it is highly dynamic and responsive to
external perturbations [2]. The relative ease of measuring gene
expression levels and the dynamic nature of RNA has propelled
Nalini Raghavachari and Natàlia Garcia-Reyero (eds.), Gene Expression Analysis: Methods and Protocols,
Methods in Molecular Biology, vol. 1783, https://doi.org/10.1007/978-1-4939-7834-2_1,
© Springer Science+Business Media, LLC, part of Springer Nature 2018
1
2 Nalini Raghavachari and Natàlia Garcia-Reyero
Fig. 1 The central dogma. The central dogma of molecular biology states that DNA contains instructions for
making a protein which are copied by RNA. RNA then uses the instructions to make a protein. In short:
DNA ! RNA ! Protein, or DNA to RNA to Protein
transcriptomics as the predominant methodology widely used in
biological research. Understanding the transcriptome is essential
for interpreting the functional elements of the genome and reveal-
ing the molecular constituents of cells and tissues, and also for
understanding development and disease [3].
The phenotype of a cell is controlled by regulation of gene
expression, which is the basis for cell differentiation, morphogene-
sis, adaptability of cells which in turn determine the functional fate
of the cells in physiological processes in health and disease [4]. Cel-
lular decisions concerning growth, differentiation, and survival are
generally reflected in altered patterns of gene expression [4]. The
study of these critical changes has always been central to any
research into biological functions of genes. Modification of gene
expression can occur at different levels. Apart from epigenetic
mechanisms (cytosine methylation, histone acetylation, and chro-
matin modification), regulation can be observed at the level of
transcription initiation (transcription factors), heteronucleic tran-
script processing (RNA splicing), mRNA transport from the
nucleus into the cytoplasm (nucleocytoplasmatic transport factors,
e.g., exportin-5), regulation by noncoding regions of the RNA, and
translational and posttranslational modifications [5, 6].
The key aims of transcriptomics are to catalog all species of
transcript, including mRNAs, noncoding RNAs, and small RNAs;
to determine the transcriptional structure of genes, in terms of their
start sites, 50 and 30 ends, splicing patterns, and other posttranscrip-
tional modifications; and to quantify the changing expression levels
of each transcript during development and under different
conditions [2].
Overview of Gene Expression Analysis: Transcriptomics 3
2 Methodology
While the transcripts originate from less than 5% of the genome in
humans and other mammals, each gene (a locus of expressed DNA)
may produce a variety of mRNA molecules using the process of
alternative splicing. Therefore, the transcriptome, from one per-
spective, has a level of complexity greater than the genome that
encodes it. Underlying a wide range of biochemical, physical, and
developmental differences, the transcriptome varies from cell to cell
depending on environmental conditions. There are two types of
RNA, noncoding RNA (ncRNA) and messenger RNA (mRNA or
protein coding RNA). ncRNAs play several key roles in gene regu-
lation including transcriptional and posttranscriptional regulation,
regulation of alternative splicing, control of transcription factor
binding, chromatin modification, and protein-coding RNA stabili-
zation [7]. Long ncRNAs (lncRNAs) include ribosomal RNA
(rRNAs) and transfer RNAs (tRNAs) and are classified into two
broad groups by size. lncRNA are greater than 200 nucleotides in
size, while small ncRNAs (sncRNAs) are 200 nucleotides or less.
lncRNAs play critical and specialized roles in numerous biological
processes including the regulation of gene expression, and pretran-
scriptional and posttranscriptional modulation of epigenetic regu-
lation [8]. sncRNAs also have several functions: microRNAs
(miRNAs) and small interfering RNAs (siRNAs) modulate post-
transcriptional gene expression by binding to specific mRNAs.
Dysfunction of ncRNA is associated with complex diseases such as
cancer, and neurological, developmental, and cardiovascular dis-
eases [9, 10]. More than 90% of the genome is transcribed into
RNA, and it is estimated that mRNA constitutes approximately
62% of the transcripts [7].
Recent advent in genomic tools and technologies is now
enabling the study of such comprehensive transcriptional charac-
terization in a robust and successful manner. In early days steady-
state levels of mRNA were directly quantitated by electrophoresis
and transferred to a membrane followed by incubation with specific
probes [11]. The RNA–probe complexes were detected using a
variety of different chemistries or radionuclide labeling. This rela-
tively laborious technique named Northern blotting was the first
tool used to measure RNA levels. Real-Time PCR was then devel-
oped to measure steady-state levels of mRNA by reverse transcrip-
tion of the RNA to cDNA followed by quantitative PCR (qPCR)
on the cDNA [12]. The amount of each specific target is deter-
mined by measuring the increase in fluorescence signal from
DNA-binding dyes or probes during successive rounds of
enzyme-mediated amplification. Expression levels can be measured
relative to other genes (relative quantification) or against a standard
(absolute quantification). Real-time PCR is the gold standard in
4 Nalini Raghavachari and Natàlia Garcia-Reyero
nucleic acid quantification because of its accuracy and sensitivity
[12]. In the 1990s, expressed sequence tag (EST) sequencing was
employed to rapidly identify expressed genes and gene fragments
[13]. Although EST sequencing is a high-throughput technique, it
is expensive. Tag-based methods including serial analysis of gene
expression, cap analysis of gene expression, and massively parallel
signature sequencing were developed, but were unable to discrimi-
nate between genetic isoforms and were very expensive to apply on
a large scale. Microarray was developed for genome-wide analysis,
and has become the most widely used approach for transcriptomics.
Recently, RNA sequencing (RNAseq) using next generation
sequencing technology has allowed the transcriptome to be char-
acterized, and the number of studies using RNA-seq have gradually
increased. Microarray and RNA-seq have become the main tools of
transcriptome research. These tools allow researchers to simulta-
neously analyze the expression of a large number of genes and to
focus on physiological equivalence.
For analyzing a small number of gene transcripts, quantitative
real-time PCR or pathway focused gene expression analysis using
PCR arrays can be used. In order to understand genome-wide
influence of different conditions, DNA microarray and RNA-
sequencing (RNA-seq) are frequently used. Recently, with the
advent of next-generation sequencing technology, transcriptomic
analysis has transitioned to RNA-seq [2] to quantify the amount of
transcripts including protein-coding genes (mRNA), splice var-
iants, as well as long noncoding RNA transcripts (lncRNA) in
biological samples at genome-wide level [14]. Comparatively
speaking, RNA-seq has the capability to identify more differentially
expressed genes in various cell types than gene microarray [2]. In
addition, there are also some commercial lncRNA array services
available, which systematically profile lncRNAs together with
protein-coding mRNAs.
3 Applications of Transcriptomics
Transcriptomic analysis allows simultaneous identification of gene
expression dynamics and differential gene expression. Transcrip-
tomic strategies have seen broad application across diverse areas of
biological, plant sciences, medical, clinical, and pharmaceutical
research for biomarker discovery, and disease diagnosis and prog-
nosis. Transcriptomics is useful to identify illness biomarkers as well
as biological responses to various stimulations and stresses, and
plays a key role in advancing genomic and molecular biology
research [15–22].
Overview of Gene Expression Analysis: Transcriptomics 5
4 Summary
Transcriptome researchers study the step in the central dogma
between DNA and protein, tapping them in an excellent position
to be a mediator for cellular functions. Transcriptomics represents a
valuable approach to molecular pathway discovery and biomarker
development. Both technical and statistical advances are currently
facilitating the application of this approach to disease pathophysiol-
ogy and management. Gene expression analysis is proving invalu-
able for both novel pathway discovery and development of
molecular signatures that serve as clinically useful biomarkers. As
we are just entering an era of single-cell transcriptomics, the near
future will likely unravel many surprising and new characteristics of
transcriptomes. The clinical potential of RNAs as disease and treat-
ment markers is fueling advances in RNA-analysis methods.
Another area in which transcriptomics can contribute to both
basic and applied research is integrative omics combining genomic,
epigenomic, transcriptomic, proteomic, and metabolomic
data [23].
References
1. Crick FH (1970) DNA: test of structure? Sci- 8. Guo X, Gao L, Liao Q, Xiao H, Ma X, Yang X,
ence 167(3926):1694 Luo H, Zhao G, Bu D, Jiao F, Shao Q, Chen R,
2. Wang Z, Gerstein M, Snyder M (2009) Zhao Y (2013) Long non-coding RNAs func-
RNA-Seq: a revolutionary tool for transcrip- tion annotation: a global prediction method
tomics. Nat Rev Genet 10(1):57–63. https:// based on bi-colored networks. Nucleic Acids
doi.org/10.1038/nrg2484 Res 41(2):e35. https://doi.org/10.1093/
3. Xu S (2017) Transcriptome profiling in systems nar/gks967
vascular medicine. Front Pharmacol 8:563. 9. Esteller M (2011) Non-coding RNAs in
https://doi.org/10.3389/fphar.2017.00563 human disease. Nat Rev Genet 12
4. Wooten DJ, Quaranta V (2017) Mathematical (12):861–874. https://doi.org/10.1038/
models of cell phenotype regulation and repro- nrg3074
gramming: make cancer cells sensitive again! 10. Taft RJ, Pang KC, Mercer TR, Dinger M, Mat-
Biochim Biophys Acta 1867(2):167–175. tick JS (2010) Non-coding RNAs: regulators
https://doi.org/10.1016/j.bbcan.2017.04. of disease. J Pathol 220(2):126–139. https://
001 doi.org/10.1002/path.2638
5. Raghavachari N, Liu P, Barb JJ, Yang Y, 11. Wang RF, Cao WW, Johnson MG (1991)
Wang R, Nguyen QT, Munson PJ (2014) Development of a 16S rRNA-based oligomer
Integrated analysis of miRNA and mRNA dur- probe specific for Listeria monocytogenes.
ing differentiation of human CD34þ cells Appl Environ Microbiol 57(12):3666–3670
delineates the regulatory roles of microRNA 12. Giulietti A, Overbergh L, Valckx D,
in hematopoiesis. Exp Hematol 42(1):14–27. Decallonne B, Bouillon R, Mathieu C (2001)
e11–12. https://doi.org/10.1016/j.exphem. An overview of real-time quantitative PCR:
2013.10.003 applications to quantify cytokine gene expres-
6. de Andres-Pablo A, Morillon A, Wery M sion. Methods 25(4):386–401. https://doi.
(2017) LncRNAs, lost in translation or licence org/10.1006/meth.2001.1261
to regulate? Curr Genet 63(1):29–33. https:// 13. Schuler GD, Boguski MS, Stewart EA, Stein
doi.org/10.1007/s00294-016-0615-1 LD, Gyapay G, Rice K, White RE, Rodriguez-
7. Pertea M (2012) The human transcriptome: an Tome P, Aggarwal A, Bajorek E, Bentolila S,
unfinished story. Genes 3(3):344–360. Birren BB, Butler A, Castle AB,
https://doi.org/10.3390/genes3030344 Chiannilkulchai N, Chu A, Clee C, Cowles S,
6 Nalini Raghavachari and Natàlia Garcia-Reyero
Day PJ, Dibling T, Drouot N, Dunham I, (6):819–828. https://doi.org/10.1007/
Duprat S, East C, Edwards C, Fan JB, s10529-017-2319-0
Fang N, Fizames C, Garrett C, Green L, 17. Flint SM, McKinney EF, Lyons PA, Smith KG
Hadley D, Harris M, Harrison P, Brady S, (2015) The contribution of transcriptomics to
Hicks A, Holloway E, Hui L, Hussain S, biomarker development in systemic vasculitis
Louis-Dit-Sully C, Ma J, MacGilvery A, and SLE. Curr Pharm Des 21(17):2225–2235
Mader C, Maratukulam A, Matise TC, McKu- 18. Gobert GN, Jones MK (2008) Discovering
sick KB, Morissette J, Mungall A, Muselet D, new schistosome drug targets: the role of tran-
Nusbaum HC, Page DC, Peck A, Perkins S, scriptomics. Curr Drug Targets 9
Piercy M, Qin F, Quackenbush J, Ranby S, (11):922–930
Reif T, Rozen S, Sanders C, She X, Silva J,
Slonim DK, Soderlund C, Sun WL, Tabar P, 19. Granata S, Dalla Gassa A, Bellin G, Lupo A,
Thangarajah T, Vega-Czarny N, Vollrath D, Zaza G (2016) Transcriptomics: a step behind
Voyticky S, Wilmer T, Wu X, Adams MD, the comprehension of the polygenic influence
Auffray C, Walter NA, Brandon R, Dehejia A, on oxidative stress, immune deregulation, and
Goodfellow PN, Houlgatte R, Hudson JR Jr, mitochondrial dysfunction in chronic kidney
Ide SE, Iorio KR, Lee WY, Seki N, Nagase T, disease. Biomed Res Int 2016:9290857.
Ishikawa K, Nomura N, Phillips C, Polymero- https://doi.org/10.1155/2016/9290857
poulos MH, Sandusky M, Schmitt K, Berry R, 20. Kan M, Shumyatcher M, Himes BE (2017)
Swanson K, Torres R, Venter JC, Sikela JM, Using omics approaches to understand pulmo-
Beckmann JS, Weissenbach J, Myers RM, Cox nary diseases. Respir Res 18(1):149. https://
DR, James MR, Bentley D, Deloukas P, Lander doi.org/10.1186/s12931-017-0631-9
ES, Hudson TJ (1996) A gene map of the 21. Lillicrap D (2002) Gene expression: overview
human genome. Science 274(5287):540–546 and clinical implications. Vox Sang 83 Suppl
14. Mortazavi A, Williams BA, McCue K, 1:77–79
Schaeffer L, Wold B (2008) Mapping and 22. Trapp J, McAfee A, Foster LJ (2017) Geno-
quantifying mammalian transcriptomes by mics, transcriptomics and proteomics: enabling
RNA-Seq. Nat Methods 5(7):621–628. insights into social evolution and disease chal-
https://doi.org/10.1038/nmeth.1226 lenges for managed and wild bees. Mol Ecol 26
15. Aziz MA, Yousef Z, Saleh AM, Mohammad S, (3):718–739. https://doi.org/10.1111/mec.
Al Knawy B (2017) Towards personalized 13986
medicine of colorectal cancer. Crit Rev Oncol 23. Gao Y, Wang F, Eisinger BE, Kelnhofer LE,
Hematol 118:70–78. https://doi.org/10. Jobe EM, Zhao X (2017) Integrative single-
1016/j.critrevonc.2017.08.007 cell transcriptomics reveals molecular networks
16. Dominguez A, Munoz E, Lopez MC, defining neuronal maturation during postnatal
Cordero M, Martinez JP, Vinas M (2017) neurogenesis. Cereb Cortex 27
Transcriptomics as a tool to discover new anti- (3):2064–2077. https://doi.org/10.1093/
bacterial targets. Biotechnol Lett 39 cercor/bhw040
Chapter 2
RNA-Seq and Expression Arrays: Selection Guidelines
for Genome-Wide Expression Profiling
Jessica Minnier, Nathan D. Pennock, Quichen Guo, Pepper Schedin,
and Christina A. Harrington
Abstract
The development of genome-wide gene expression profiling technologies over the past two decades has
produced great opportunity for researchers to explore the transcriptome and to better understand
biological systems and their perturbation. In this chapter we provide an overview of microarray and
massively parallel sequencing technologies and their application to gene expression analysis. We discuss
factors that impact expression data generation and analysis that which should be considered in the applica-
tion of these technology platforms. We further present the results of a simple illustration study to highlight
performance similarities and differences in expression profiling of protein-coding mRNAs with each
platform. Based on technical and analytical differences between the two platforms, reports in the literature
comparing arrays and RNA-Seq for gene expression, and our own example study and experience, we
provide recommendations for platform selection for gene expression studies.
Key words Massively parallel sequencing, Microarray, RNA-Seq, Expression array, Expression
profiling, Differential expression
1 Introduction
The development of DNA microarrays in the 1990s transformed the
analysis of gene expression. Researchers at Stanford University [1]
and Affymetrix, Inc. [2] showed that arrays of thousands of ampli-
fied cDNAs or oligonucleotides attached to a solid surface could be
used to simultaneously measure the expression levels of hundreds of
gene transcripts. Within a few years, array technology had developed
to the point that a single DNA microarray was able to survey
transcripts from all of the annotated genes of a given organism,
and the comprehensive description of the transcriptomic landscape
took off. In the mid-2000s another technical advance, massively
parallel DNA sequencing [3–5], further expanded our ability to
study and measure the transcriptome by enabling simultaneous
Nalini Raghavachari and Natàlia Garcia-Reyero (eds.), Gene Expression Analysis: Methods and Protocols,
Methods in Molecular Biology, vol. 1783, https://doi.org/10.1007/978-1-4939-7834-2_2,
© Springer Science+Business Media, LLC, part of Springer Nature 2018
7
8 Jessica Minnier et al.
sequencing of hundreds of thousands of amplified DNA fragments.
Massively parallel sequencing applied to RNA, termed RNA-Seq
[6–8], offered theoretical advantages compared to expression
arrays, however, initially high costs and methodological challenges
limited its broad adoption. Today many RNA-Seq applications are
available at similar costs for data generation as those for microarray
platforms and both of these technologies are widely available
through core laboratories and commercial service providers.
While the advent of massively parallel sequencing has provided
a sequencing-based technology without some of the limitations
associated with hybridization-based DNA array assays, arrays con-
tinue to be used in many laboratories as an effective and accessible
option for measuring RNA levels. For those considering an expres-
sion profiling study, we provide guidance in choosing one approach
over the other. We describe issues to consider in selecting a tech-
nology platform and assay method for measurement of
transcriptome-wide RNA abundance and summarize how platform
characteristics have been shown to impact gene expression mea-
surements in comparison studies of RNA-Seq and expression array.
To illustrate differences and similarities in utilization and data out-
comes between the technology platforms, we present a simple,
focused gene expression study utilizing the Affymetrix GeneChip
expression array assay and the Illumina TruSeq mRNA sequencing
assay. Our results demonstrate general concordance in gene expres-
sion outcomes between the two platforms while highlighting per-
formance differences. We discuss how the choice of a particular
technology for expression profiling requires careful consideration
of study goals, platform capabilities, and practical aspects of plat-
form utilization and data analysis.
2 Technology and Workflows for Expression Array and RNA-Seq Studies
The basic steps of a gene expression assay with either RNA-Seq or
DNA arrays are conceptually similar: (1) RNA is converted to
cDNA and amplified; (2) gene transcripts present in the amplified
material are identified based on their sequences and quantified;
(3) measurement data is normalized across samples; and (4) normal-
ized data is analyzed to identify differentially expressed genes and
regulated pathways. However while RNA-Seq and expression
arrays address the common goal of RNA profiling, they differ not
only in underlying technology but also in the details of sample
processing, transcript measurement, and data analysis (Fig. 1). In
this section we provide an overview of each technology with a focus
on genome-wide expression analysis platforms in widespread use
today. We discuss technical and analytical considerations for suc-
cessful application of either platform for transcriptome analysis and
highlight differences between the two platforms that may impact
performance and successful data outcomes.
RNA-Seq and Expression Arrays: Selection Guidelines for Genome-Wide. . . 9
Fig. 1 Workflow of typical gene expression profiling with expression array or RNA-Seq
10 Jessica Minnier et al.
2.1 Technology DNA microarray technology is based on the hybridization of
Platforms fluorescently labeled targets prepared from RNA or DNA samples
to a large number of DNA sequences, or probes, attached to a solid
surface [9–11]. The current generation of microarrays consists of
small glass chips or slides on which many distinct oligonucleotides
(oligo probes) complementary to known genomic or RNA
sequences have been synthesized or printed in a predetermined
pattern. Oligo probe lengths vary depending on underlying tech-
nology and specific application, but are typically either 25 or
60 bases long. Expression array assay of messenger RNA (mRNA)
and long noncoding RNAs (lncRNA) requires cDNA synthesis
from total RNA and amplification through nucleic acid synthesis
to produce RNA or DNA targets for hybridization [12]. Targets are
either directly labeled with a fluorescent dye or labeled with a biotin
tag that is identified after array hybridization using a fluorescent
molecule conjugated to streptavidin. Following amplification and
labeling, targets are hybridized with the oligo probes on the array.
Depending on platform, array hybridization may involve a single
target sample or the simultaneous hybridization of a test and con-
trol sample labeled with different fluors. Location and amounts of
annealed, fluorescently labeled targets are measured by scanning
the array with a laser scanner. Fluorescence intensities for each
probe feature (many copies of unique oligo at one location) is
averaged to produce an intensity value for each probe. Following
background subtraction and probe intensity data normalization
across arrays, relative expression levels per transcript unit are deter-
mined. The most recent arrays have millions of probe features and
are designed to measure expression levels of genes, exons, exon
splice junctions, and alternative transcript isoforms.
There are two commercial array platforms in wide use today:
GeneChip™ Expression arrays (Affymetrix/Thermo Fisher Scien-
tific Inc.; see Note 1), and SurePrint G3 Expression arrays (Agilent
Technologies). Detailed platform and array information can be
found at vendor websites, including information on expression
arrays designed to measure lncRNAs as well as mRNA, and arrays
for microRNA (miRNA) profiling.
Massively parallel sequencing technology, also described as
next-generation sequencing or high-throughput deep sequencing,
uses direct sequencing of amplified libraries of DNA fragments to
analyze RNA and DNA samples [13, 14]. Established platforms use
two general sequencing approaches, sequencing by ligation or
sequencing by synthesis, to produce “short reads” of clonally
amplified DNA distributed on a solid surface. For RNA-Seq,
RNA samples are typically fragmented, reverse-transcribed, ligated
to oligo adapters and amplified to produce cDNA libraries for
sequencing [15, 16]. Transcript subsets (e.g., mRNA or focused
gene panels) may be selected from total RNA prior to cDNA
synthesis or prior to sequencing by hybridization-based capture of
RNA-Seq and Expression Arrays: Selection Guidelines for Genome-Wide. . . 11
targeted sequences during library preparation. Individual libraries
can be bar-coded with unique DNA tags to allow the pooling of
multiple samples per sequencing lane or chip, reducing the overall
cost of sequencing per sample. The parallel sequencing of many
different clonally amplified DNAs produces tens to hundreds of
millions of paired-end (from both ends of a DNA fragment) or
single-end DNA reads (typically 35–500 bases in length, depending
on platform and application). Raw reads are converted to FASTQ
files containing the sequence and quality scores for every read. To
identify transcripts, reads are then mapped to an annotated refer-
ence sequence when one is available. This mapping process is called
“alignment.” Sequence assembly can also be performed de novo
with very deep sequencing. Reads aligned to a reference sequence
are summarized as counts per exon, gene, or other transcript fea-
ture [17]. Unique capabilities of RNA-Seq include detection of
novel transcripts and mutations within transcripts.
Massively parallel sequencing systems in current use for RNA-
Seq include the NovaSeq, HiSeq, NextSeq, and MiSeq series from
Illumina (Illumina, Inc.) and the Ion Torrent sequencing systems
(Thermo Fisher Scientific, Inc). Platform and application informa-
tion can be found at vendor websites.
The technology information presented here is not intended to
be comprehensive or in-depth, and we do not consider single
molecule, long read sequencing technologies or single cell analysis
as these techniques are currently more expensive, not as widely
available, and targeted to specific applications. We recommend
that the reader seek out the many excellent reviews on these tech-
nologies, some of which are referenced above.
2.2 Technical A gene expression experiment consists of five interrelated compo-
and Analytical nents: (1) study design, (2) sample collection and processing,
Considerations (3) data generation, (4) data analysis, and (5) data interpretation.
The decision to select a particular platform for measurement of the
expression profile of a group of samples should include consider-
ation of the biological question, the characteristics of the samples
under study, and the capabilities of the available analysis platforms.
If the study goal is to identify which protein-coding genes or
biological pathways are changing under the study conditions, any
platform which provides a comprehensive measurement of mRNAs
should be adequate. If a broader measurement of RNA transcripts
such as miRNAs and lncRNAs are sought then technologies and
methods that allow detection of these other RNAs are required.
Currently neither massively parallel RNA-Seq nor expression array
support measurement of both large RNAs (mRNAs, lncRNAs) and
miRNAs in the same assay. Both platforms do, however, allow
detection of mRNA and lncRNA in a single assay, and RNA-Seq
provides opportunity for discovery of unannotated transcripts.
12 Jessica Minnier et al.
The availability of sequence information and annotated genes
for the organism under study may limit approaches that can be
taken for expression profiling. In routine application of RNA-Seq, a
genomic sequence framework is required for alignment of the
sequenced DNA fragments and the quality of the available
sequence will influence the ease of analysis. DNA arrays, on the
other hand, depend on available sequence information and gene
annotation for design and synthesis of oligonucleotide probes on
the array. Multiple arrays are available for human studies and widely
used model organisms (e.g., mouse, Drosophila, and Rhesus), as
well as a large assortment of agriculturally important species.
Below we highlight aspects of data generation and analysis that
impact gene expression data quality and success of differential gene
expression analysis. These factors should be considered in study
design and selection of the optimal expression analysis platform.
2.2.1 Data Generation l RNA biotypes and selection.
Considerations l RNA amount and quality.
l Batch effects/technical variation.
l Dynamic range and sequencing depth.
For both expression array assay and RNA sequencing applications,
the type (mRNA, total RNA, miRNA, etc.), amount, and quality of
RNA being analyzed will inform the selection of sample preparation
methods and affect data outcomes [18, 19]. For analysis of protein-
coding mRNAs and lncRNAs, array assays can directly utilize total
RNA as input at the cDNA synthesis step. Since RNA-Seq plat-
forms generate sequence reads from all cDNA fragments included
during library preparation, removal of ribosomal RNAs (rRNA) is
recommended for gene expression studies in order to avoid a
sequencing output dominated by these very high abundance struc-
tural RNAs. Selection of polyAþ RNA is commonly used to enrich
for protein-coding mRNAs for RNA-Seq studies; alternatively,
rRNA depletion methods can be used to reduce rRNA while retain-
ing mRNA and lncRNA [15]. For polyAþ selection methods it is
important that RNA is largely intact (as determined by size analysis
on a Bioanalyzer [20] or similar instrument). Ribosomal RNA
depletion methods are effective and widely used for RNA-Seq,
however, sequencing libraries produced from the depleted samples
will still include lncRNAs and other RNA species. Therefore, the
number of sequence reads per library will need to be increased to
detect lower abundance RNAs. If RNA is degraded across all sam-
ples, such as with RNA extracted from formalin-fixed, paraffin-
embedded (FFPE) specimens, alternative workflows are required
for both RNA-Seq and arrays. The amount of RNA available will
also affect method selection. Methods are available for very low
RNA inputs for both arrays and RNA-Seq (as little as a hundred
RNA-Seq and Expression Arrays: Selection Guidelines for Genome-Wide. . . 13
picograms), but array target or library preparation costs generally
increase with the lower input amounts and sequence complexity
can be limited with the lowest inputs [21].
Both platforms are sensitive to sources of nonbiological varia-
tion among samples which can be introduced during RNA isola-
tion, sequencing library preparation and sequencing, or array target
preparation and hybridization [18, 22, 23]. Technical variation can
contribute to reduced sensitivity and batch effects due to grouping
of samples during RNA isolation or sample processing. Every effort
should be made to standardize RNA isolation techniques within a
study and extract all RNA samples at the same time in order to
minimize batch effects associated with RNA isolation [22]. Batch
effects or technical variation can be introduced at multiple steps in
preparation of sequencing libraries or array hybridization targets.
Sample randomization at the beginning of any expression study
should be thoroughly considered for specific platform and target
preparation protocol prior to sample processing to minimize the
effect of any technical bias and avoid confounding the resulting
data [24, 25] (see Note 2).
The dynamic range of array generated data is limited compared
to RNA-Seq [26, 27]; generally 3–5 logs of dynamic range for
arrays while RNA-Seq theoretically has an unlimited dynamic
range. In practice, however, the reliable detection of low abundance
transcripts in RNA-Seq is limited by the number of sequencing
reads [28], or sequencing depth, and higher read numbers add to
experimental cost; in general, doubling reads per sample leads to
doubling the sequencing cost of data acquisition. Furthermore,
while increasing the number of sequence reads can improve detec-
tion of low abundance transcripts, it also results in noisier data
[29]. Most applications of RNA-Seq and expression array are
directed to the determination of genes expressed at different levels
among biological conditions or sample treatments. The ability to
accurately measure fold changes of expression for particular genes
or transcripts is also impacted by platform dynamic range. Due to
its larger dynamic range RNA-Seq data can result in the detection
of larger fold changes than arrays, in particular at the high end and
low end of abundance range [26]. Additionally, probes on the array
can become oversaturated with the result that signal from highly
abundant transcripts can be attenuated leading to reduced fold
change values for highly expressed genes.
2.2.2 Data Analysis l Data processing and normalization.
Considerations l Analysis pipelines and bioinformatics skills.
l Statistical methods and expertise.
l Experimental design and power analysis.
14 Jessica Minnier et al.
Microarray data are expression intensities measured at the probe
level. RNA-Seq data are reads of fragmented DNA sequences. Due
to the differences in raw data type, the processing and analysis
workflows differ in multiple ways, including normalization and
statistical analysis of differential gene expression (see Fig. 1b). For
both types of data, raw data measurements are not comparable
between samples due to inherent technical variability (i.e., total
number of reads per sample, or differences in dynamic range
detected), and so in order to compare gene level measurements
across samples, data must be normalized to a similar scale or distri-
bution in order to examine relative gene expression between sam-
ples and groups [30]. Array data is normalized at the probe level,
often after background subtraction, and then must be summarized
to gene or transcript level by aggregating probe expression into
probesets, through methods such as Robust Multichip Average
(RMA) [31] which incorporates quantile normalization and
median polish summarization for averaging probeset intensities.
There are many other normalization options—loess, q-splines, for
example [30]—and bioinformatics expertise is often needed to
determine the best normalization for the data generated depending
on observed variability and technical artifacts as well as specific array
design. RNA-Seq must be preprocessed (to remove repeated
sequences added in library preparation or for barcoding, as well as
low-quality sequences) and then aligned or mapped to a reference
genome or transcriptome. Annotation for RNA-Seq must be per-
formed to map either reads or probe sets to transcriptome units
(protein-coding genes, small RNAs, pseudogenes, etc.) to generate
gene, transcript or exon counts [32] while annotation for array
probes/probe sets is usually supplied by vendor software or infor-
mation sheets. Many genome annotation databases are available for
RNA-Seq (i.e., RefGene, Ensembl, and UCSC Genome Browser),
and the selection of the database can greatly influence downstream
analyses [33] as well as comparisons between array and RNA-Seq
expression [34] (see Note 3 for further discussion). Normalization
of RNA-Seq must take into account systematic variation such as
differences in library sizes (sequencing depth) across samples
[34]. Alignment and annotation require extensive bioinformatics
expertise. Due to the additional processing steps required for
RNA-Seq, additional QC must be done to examine sequence qual-
ity and alignment statistics.
To measure differentially expressed (DE) genes or transcripts,
differences in expression between groups must be assessed with
statistical analyses. For array data, typical linear models can be
used to test for DE of a gene (linear regression, ANOVA, t-tests)
because log-expression is a continuous measurement that is approx-
imately distributed as a Normal (Gaussian) random variable.
RNA-Seq data, however, is measured in integer counts which do
RNA-Seq and Expression Arrays: Selection Guidelines for Genome-Wide. . . 15
not follow the typical Normal distribution due to the dependence
of the variance on the mean. Therefore, statistical methods that
model count data must be used to test for DE such as generalized
linear models assuming a negative binomial distribution of the read
counts for a gene [35]. Alternatively, methods for estimating “vari-
ance weights” (i.e., voom method, [36]) can be implemented to
allow RNA-Seq to be analyzed with the same linear models used for
array DE analysis.
Due to the high throughput nature of both platforms, adjust-
ment for multiple testing after DE analysis is necessary and can be
performed with Bonferonni or other p-value adjustments or with
control of the False Discovery Rate (FDR) through procedures
such as Benjamini–Hochberg adjustment or Storey’s q-value
approach [37] (see web-app http://qvalue.princeton.edu/). As
with all studies, the success of a study depends heavily on experi-
mental design and sufficient replicates to have statistical power to
address hypotheses of interest (see Note 2) and to detect differences
with a reasonable FDR. Platform factors (i.e., library prep, sequenc-
ing lane, hybridization group) should be randomized across sam-
ples, and processing steps and data generation groups should be
balanced to avoid confounding biological and technical factors.
Software: Open source Unix and R-based software is available
for all of these analysis steps [38]. Most alignment methods are unix
or python-based programs (i.e., STAR, TopHat2, Subread [32]),
and most statistical analysis and normalization is available in R/Bio-
conductor packages. This software requires moderate program-
ming expertise or collaborations with bioinformaticians and/or
statisticians. Alternatively, some commercial options are available
for processing, analysis, and visualization of data from both plat-
forms (i.e., Partek, Transcriptome Analysis Console, Genespring;
see Note 4 for more information). These products implement many
of the same algorithms and methods as open source software but
with user-friendly interfaces. Due to the maturity of array plat-
forms, recommended normalizations for arrays in these packages
can provide excellent results. Standard analysis pipelines for
RNA-Seq are still evolving: selection of optimal mapping, normali-
zation, and data filtering strategies are best done in collaboration
with an experienced analyst or biostatistician. For comprehensive
views of challenges and best practices for RNA-Seq data generation,
processing, and analysis see refs. 39, 40.
3 Comparison of RNA-Seq and Expression Array
Many studies have compared the expression patterns and differen-
tial expression results of RNA-Seq and array data on various plat-
forms and sample types (Table 1, [26, 27, 41–45]. Most studies
16
Table 1
Comparisons of RNA-Seq and microarray in the literature—technology and design details
Experimental
Reference Array typea RNA-Seq platform Library prep Sequencing detailsb Sample design
Jessica Minnier et al.
Marioni et al. Affymetrix HG-U133 Illumina Genome polyAþ selection 32 bp, ~8–15mil One human male Seven technical
[41] Plus 2 Analyzer sample from liver replicates, two
and kidney cDNA conc., two
sequencing runs,
three arrays
Bottomly et al. Affymetrix MOE 2.0, Illumina GAIIx polyAþ selection, 300 bp, 21 samples in Adult B6 and D2 10þ strain
[43] Illumina MouseRef- Illumina mRNA- 21 lanes strain mice; replicates, all male
8 Seq Sample variation in prep for RNA-Seq and
Preparation kit dates, sex of samples Illumina array,
among platforms equal male/
female Affymetrix
array
Raghavachari Affymetrix Human Illumina GAIIx polyAþ selection Single lane, 36 cycles, Human Whole Blood Six patients sickle
et al. [42] Exon 1.0 ST (Illumina sequencing depth (PAXGene), cell disease, four
recommendations) ~10mil GLOBINclear controls
depletion
Zwemer et al. Affymetrix HG-U133 Illumina HiSeq NuGEN Ovation Paired end, 50 bp, Human amniotic fluid Three male, two
[44] Plus 2 2000 RNA-Seq V2 ~17–100 mil pairs; female fetuses
(Husdson Alpha low alignment stats
Institute) (5–35% for all
but 1)
SEQC Affymetrix HG-U133 Illumina HiSeq Varies with site Paired end; Illumina: Human, rat, multiple Many samples in
Consortium Plus 2, Affymetrix 2000, Life 100 bp, ~110 mil tissue types consortium
[45] HuGene2.0, Technologies pairs; Solid:
Affymetrix SOLiD 5500, 51/36 bp, ~50mil;
PrimeView, Roche 454 GS Roche: ~1mil
Illumina Bead array FLX
Zhao et al. Affymetrix HG-U133 Illumina HiSeq polyAþ selection Paired end, 90 bp Human CCR6þ 6 time points, þ/
[26] Plus 2 2000 (BGI) CD4þ T cells stimulation in
duplicate
Yu et al. [27] Agilent Hum Genome Illumina HiSeq SMARTer UltrLow 50 bp, ~36mil reads One pool of human Two samples plus
4x44K, Illumina 2500 RNA kit, bone marrow ratio-varied mix of
Hum HT12v4, Epicentre RNAs, Universal the two samples
Affymetrix Hum RiboZero Human Ref RNA (1–3 reps)
Gene 1.0 and (Agilent)
HTA2.0
Nazarov et al. Affymetrix HTA 2.0 Illumina HiSeq Ribosomal RNA Paired end, 77 bp, Matched primary Nine matched
[46] 2000 depletion 120–280 mil reads human lung tumor samples
and adjacent tissue
a
Total RNA
b
Unstranded and single end unless otherwise stated
RNA-Seq and Expression Arrays: Selection Guidelines for Genome-Wide. . .
17
18 Jessica Minnier et al.
have concluded that microarray platforms suffer from such techni-
cal issues as cross-hybridization, nonspecific hybridization, and
limited range of detection of individual probes. As a result, genes
with low expression levels below or near background may exhibit
increased variability and fold changes calculated using these genes
might be difficult to detect with statistical significance. RNA-Seq is
not dependent on the selection of known genes or transcripts, and
furthermore does not depend on the performance of probes or
probesets thereby avoiding the sensitivity and saturation limitations
of arrays. The extended dynamic range of RNA-Seq allows for the
detection of larger fold changes. A recent report [46], however,
suggests that the larger dynamic range of RNA-Seq may suffer
limitations due to technical variability of expression measurements
across samples, especially for low abundance transcripts for which it
is difficult to obtain precise abundance measurements without
many replicates. Sequenced reads can be aligned to any assembled
genome or transcriptome and so with sufficient read depth can lead
to the discovery of novel splice-variants, mutations, and isoforms.
This has the advantage of “future-proofing” the data in that reads
can be realigned to future versions of transcriptomes that may have
revised gene definitions. However, as previously discussed, the
processing and alignment of RNA-Seq data necessitate a substantial
level of bioinformatics expertise, exacerbated by the lack of a gold-
standard set of processing and analysis workflows [40]. Other dis-
advantages are that RNA-Seq reads will align to multiple locations
in a genome and certain transcripts can have library preparation
biases which lead to nonuniform coverage of the transcriptome and
an inaccurate representation of abundance. When the amount of
material is limited, high-density arrays can be more powerful in
detecting genes than RNA-Seq [42]. Both platforms can be hin-
dered by sample contamination, but if RNA-Seq samples have
inadequate removal of highly abundant structural RNAs these
reads can oversaturate the data and dominate expression estimates
so that other genes have lower read counts or are not detected.
Despite the reliability of both platforms for measuring gene
expression, direct integration or comparison of data across plat-
forms remains a challenge. First, RNA-Seq and array are not often
measured on the same genomic unit in that the transcripts
measured by RNA-Seq do not match one-to-one with the tran-
scripts measured by array probesets within the same gene. Realign-
ing RNA-Seq data to a common set of transcripts from the array
data can improve integration but may result in a loss of information
and precision from sequencing data that is discarded. It is impor-
tant to implement thorough transcript mapping methods to ensure
that the definitions of genes measured in RNA-Seq are comparable
to array probesets [47] (see Note 3). Even after ensuring RNA-Seq
and array are measuring expression intensity on a comparable set of
transcripts or genes, differences in dynamic range and technical
RNA-Seq and Expression Arrays: Selection Guidelines for Genome-Wide. . . 19
variation between the two platforms limit reliable integration of
data at the gene intensity level for differential expression analysis.
Careful consideration of how the experiments differ in processing
and normalization is needed before information from data sets can
be combined [48]. However, each platform’s strengths and weak-
nesses can potentially complement each other and improve accu-
racy of results. See Note 5 for further discussion.
3.1 Illustration Study We conducted a simple platform comparison utilizing RNA
Comparing RNA Seq extracted from mouse mammary fibroblasts to illustrate different
and DNA Array platform expression profiling workflows and results with a sequenc-
ing technology, Illumina Tru-Seq, and a DNA array technology,
Affymetrix Genechip WT (whole transcript) assay. This comparison
uses two sample groups (fibroblasts isolated from nulliparous or
postweaning mice) with two biological replicates each. These sam-
ples were selected from a larger RNA-Seq study reported by Guo
[49]. When performing gene expression comparisons, the ability to
statistically determine changes is based upon the magnitude of
changes in a given gene between groups (between-group variation
or differences) compared to the magnitude of variation of the gene
within the group (within-group variation or similarity). Principal
component analysis of gene expression using the RNA-Seq data
identified the within-group variation to be small while gene expres-
sion differences between treatment groups was distinct (large
between-group variation). We believe these samples are well suited
to a cross platform comparison in which we are interested in
breadth of gene detection as well as platform variability measure-
ments. We consider this design to represent a minimal experimental
plan for generating useful information about platform performance
for differential gene expression measurements focused on protein-
coding mRNAs.
In this study, experimental samples were collected and RNA
was isolated in the research laboratory. Subsequent steps for expres-
sion data generation using array and RNA-Seq procedures were
performed in the OHSU Gene Profiling Shared Resource and the
OHSU Massively Parallel Sequencing Shared Resource, respec-
tively, following standard protocols. Sample processing and data
generation costs were similar for the two platforms as applied in this
study.
3.1.1 Materials 1. RNA samples.
Total RNA isolated with NucleoSpin RNA (Machery Nagel)
was provided to the core laboratories. RNA samples were of
high quality: Bioanalyzer RIN values of 9.4–9.9.
Four mouse mammary fibroblast samples were selected for
analysis: two biological replicates from condition 1, and two
biological replicates from condition 2.
2. RNA-Seq: library preparation, quantification, and sequencing.
20 Jessica Minnier et al.
(a) TruSeq RNA Sample Preparation V2 Kit (Illumina, Inc.)
(also listed on Illumina website as TruSeq RNA Library
Prep Kit v2).
(b) KAPA Library Quantification Kit—Illumina/ABI Prism®
(Kapa Biosystems).
(c) HiSeq® SBS Kit v4 (Illumina, Inc.).
3. Expression Array: target labeling, array hybridization, and
processing.
(a) GeneChip WT Pico Reagent Kit (Affymetrix, Inc.).
(b) GeneChip Hyb, Wash, and Stain Kit (Affymetrix, Inc.).
(c) GeneChip MouseTranscriptome Array (now sold through
ThermoFisher as the GeneChip Clariom D mouse array).
The Mouse Transcriptome Array interrogates ~23,000
protein-coding genes (transcript clusters) and ~55,000
long noncoding RNA genes (transcript clusters) (https://
tools.thermofisher.com/content/sfs/brochures/EMI07
313-2_DS_Clariom-D_solutions_HMR.pdf). The array is
designed to support both global gene-level analyses and
transcript isoform measurements.
3.1.2 Methods 1. RNA-Seq: library preparation, quantification, and sequencing.
PolyAþ RNA was selected from 125 ng total RNA, fragmen-
ted, and converted to amplified cDNA libraries according to
manufacturer instructions (https://support.illumina.com/con
tent/dam/illumina-support/documents/documentation/
chemistry_documentation/samplepreps_truseq/truseqrna/
truseq-rna-sample-prep-v2-guide-15026495-f.pdf). Four sam-
ples were pooled per sequencing lane and sequenced on an
Illumina HiSeq 2500 with 100 cycles to produce 100 base
reads. The resulting data were converted to FASTQ format
using Bcl2Fastqc software (Illumina) and transferred for align-
ment and data analysis. Approximately 50 million read counts
per sample were generated.
2. Expression array: target labeling and array hybridization, pro-
cessing, and scanning.
Ten nanogram total RNA were reverse-transcribed to double-
stranded cDNA, amplified and biotin-labeled according to
manufacturer instructions (https://tools.thermofisher.com/
content/sfs/manuals/703262-WT-Pico-Reagent-Kit-UG-
rev-5.pdf). Five microgram of biotin-labeled cDNA was hybri-
dized to the Mouse Transcriptome/Clariom D mouse array in
a 16 h. incubation at 45 C. Arrays were then washed and
stained on a GeneChip Fluidics Station 450 and scanned on a
GeneChip Scanner 3000 7G according to Affymetrix recom-
mendations. All array images passed visual inspection for qual-
ity. Raw data files were converted to .CEL files with Affymetrix
Command Console (AGCC) and transferred for data analysis.
RNA-Seq and Expression Arrays: Selection Guidelines for Genome-Wide. . . 21
3.1.3 Data Analysis 1. RNA-Seq: Reads were aligned to the mouse reference genome,
build GRCm38 using STAR 2.4.2a [50]. STAR performed
counting of reads per gene as defined in Ensembl build
81 (GENCODE version m6). Read count distributions were
normalized across samples using the Trimmed Mean of
M-values (TMM) method from the edgeR package [51] in
Bioconductor (https://www.bioconductor.org/). Variance
stabilization was performed with the voom method in the
limma Bioconductor package [52]. Normalized read counts
were log2-transformed for further analysis. Genes were
removed from downstream analysis if fewer than two samples
exhibited counts per million (CPM) greater than 0.2
(corresponding to a read count of 7–9) as recommended by
edgeR’s user guide. The number of genes kept in the analysis
was 13,618 out of the 45,706 genes with nonzero raw read
counts. Density plots of the distribution of read counts across
samples, heatmaps and PCA clustering plots were used to
visualize the distributions and clustering of the data.
Differential expression (DE) was determined by fitting linear
regression models for each log2 normalized gene expression
level with condition 2 vs 1 as the independent variable, using
the limma package. Empirical Bayes moderation of the stan-
dard errors was used to compute moderated t-statistics and p-
values. Multiple hypothesis-testing was accounted for by
controlling the False Discovery Rate (FDR) at 5% with the q-
value approach [37]. The set of significant genes was further
filtered to include only genes with absolute value of log2-fold
change (FC) greater than 0.5 (1.4-fold). MA plots and volcano
plots were used to visualize the differential expression results.
Tests for overrepresentation of biological process gene ontol-
ogy (GO) terms in the set of significant genes were conducted
with the limma package for GO analysis [53], and significant
terms were defined as p-value <0.05.
2. Expression arrays: The Robust Multichip Average (RMA) algo-
rithm in the oligo package [54] was used for background sub-
traction, quantile normalization, and median-polish probe
summarization. Probes were summarized to the transcript cluster
level, which indicate groups of transcripts (possibly more than
one) mapped to a gene (http://tools.thermofisher.com/con
tent/sfs/brochures/exon_probeset_trans_clust_whitepaper.
pdf). Normalized intensities were log2-transformed for further
analysis. The 95th percentile of the normalized intensity distri-
bution of antigenomic background control transcript clusters was
calculated as a background cutoff, and transcript clusters were
removed from further analysis if fewer than two samples exhib-
ited normalized expression levels above this cutoff. The number
of transcript clusters kept in the analysis was 13,230 out of
22 Jessica Minnier et al.
65,956 total on the array. DE analysis, visualization of data and
results, and GO analyses were performed with the same methods
as for the RNA-Seq data using the limma package.
3. Cross-platform analysis: Genes were mapped between
RNA-Seq and expression array data using the Ensembl identi-
fiers provided by Affymetrix (for array) and from Bioconductor
package AnnotationDbi (for RNA-Seq). If an Ensembl id was
not available, gene symbols were used to match genes. If the
RNA-Seq symbol mapped to multiple transcript clusters, we
chose the first match. The log2-FC for each gene was com-
pared across platforms by calculating Pearson correlations, uni-
variate linear regressions of log2-FC in RNA-Seq regressed on
log2-FC in array across all genes, visualized with scatterplots
(Fig. 2).
R code for the statistical analyses described above is available
online (https://github.com/jminnier/rnaseq_microarray_compa
rison).
Fig. 2 Log2-Fold change (Condition 2 vs 1) from microarray and RNA-Seq data. Each point represents a gene
that is represented on both platforms (8520 genes total). Genes were divided into four panels based on
quantiles of average RNA-Seq log2-expression. Darker shades represent larger average abundance. Pear-
son’s correlation as well as the intercept and slope from the univariate regression model of log2-FC in
RNA-Seq regressed on log2-FC in array are displayed. Black lines represent the diagonal slope of 1, which
would occur when fold changes in the platforms are identical. A slope larger than 1 from the univariate
regression denote a trend of larger fold change magnitudes measured in RNA-Seq
RNA-Seq and Expression Arrays: Selection Guidelines for Genome-Wide. . . 23
3.2 Performance Expression array and RNA-Seq identified approximately 13,230
Comparison Results and 16,696 genes as above background, respectively; of these
genes, 8520 (64% of array, 51% of RNA-Seq) were identified by
both. The Affymetrix array used in this study is designed to measure
mRNAs and lncRNAs (and junction sequences) whereas the
RNA-Seq experiment selected for polyAþ RNA. Therefore, we
would expect the RNA-Seq results to be enriched in protein-coding
genes as a percentage of all genes measured whereas the array data
should have a broader profile. As seen in Table 2, a higher percent-
age of genes measured in RNA-Seq were protein coding compared
to array: 14,554 (87% of RNA-Seq) and 9262 (70% of array after
background filtering) were protein-coding with 8254 identified by
both platforms (97% of overlap) (see Note 6 for further discussion
of nonmRNA reads in the RNA-Seq data). The correlation of log2-
FC for the set of overlapping genes was high—0.70–0.95 within
each quartile of average gene expression, and 0.88 overall (Fig. 2).
Correlation between platforms is strongest with the more highly
expressed genes; compare r ¼ 0.70 for the lowest expression quar-
tile with r ¼ 0.95 for the highest expression quartile. As expected,
we observed an extended linear range of RNA-Seq expression
values compared to array (voom normalized log2-expression in
RNA-Seq [min approx. 0, max 15.91] vs normalized log2-expres-
sion in array [min 3.93, max 14.48]).
Table 2
Number of genes and gene ontology terms detected by each platform separately as well as with both
platforms
Number Microarray RNA-Seq Platform overlap
Genesa quantified: 26,596 (65,956) 22,001 (45,706) 22,001 (37,921)
Protein-coding (All RNA) genes
Genesa after low expression filteringb 9262 (13230) 14,554 (16696) 8254 (8520)
a
DE genes FDR 5%; 671 (962); 1918 (2095); 550 (553)
As subset of overlapc 601 (604) 1241 (1268)
DE genesa FDR 5% and log2FC > 0.5; 484 (745); 1595 (1770); 411 (414)
As subset of overlapc 431 (434) 956 (983)
GO terms identified 11,365 12,964 11,298
GO terms p < 0.05 889 1682 662
a
Number of protein coding genes (number of total genes)
b
Filtering: For microarray data, the 95th percentile of the normalized intensity distribution of antigenomic background
control transcript clusters was calculated as a background cutoff, and transcript clusters were removed from further
analysis if fewer than two samples exhibited normalized expression levels above this cutoff. RNA-Seq genes were removed
from further analysis if fewer than two samples exhibited CPM greater than 0.2. This constitutes the set of genes tested
for differential expression
c
Numbers in italics are a subset of the 8520 genes found on both platforms. For each platform, these correspond to
significant detection in one platform regardless of whether it was significant for the other platform
d
DE denotes differentially expressed genes, GO denotes Genome Ontology terms belonging to the Biological Process
ontology
24 Jessica Minnier et al.
Differential expression analysis of these filtered genes resulted
in detection of 671 genes by array and 1918 by RNA-Seq with 82%
of array DE genes present in RNA-Seq results (Table 2). The log2
range of FCs was approximately doubled using RNA-Seq compared
to array ([7.57, 8.22] vs [3.62, 4.11]), and the median of
absolute value of log2FC for genes with q-value <0.05 in
RNA-Seq was 0.94 seq, vs median 0.70 in array. Genes found to
have significant (q-value <0.05) differential expression on both
platforms similarly had larger abs-log2FC in RNA-Seq than array
([0.42, 8.21] median 1.10 vs [0.33, 4.11], median 0.66 in array).
Note that in RNA-seq many of the genes with the largest FCs have
0 or very low counts in one of the conditions. Also, FCs are
calculated from cumulative gene counts in RNA-Seq or average
probe expression for array, which may impact sensitivity of detected
FCs depending on gene structure. Data from RNA-Seq resulted in
larger FCs on average, and a higher number of differentially
expressed genes and over-expressed GO terms (Table 2).
The results of our comparison study show that for genes
detected with both array and RNA-Seq the FC values are highly
concordant. Discrepancies are greatest for the lowest quartile of
RNA abundance measurements. This is consistent with the techni-
cal differences in signal detection between the two platforms as
previously discussed. For genes with the largest differences in
expression between sample groups, the magnitude of fold changes
detected by RNA-Seq tend to be larger than those measured with
array. Therefore, for studies focusing on estimating magnitude of
change, especially large differences, RNA-Seq may the preferred
platform. For the analysis of the array data, we used standard
methods, e.g., RMA normalization and summarization, however,
alternative strategies are available that may increase sensitivity and
FC dynamic range for the array data (see Transcriptome Analysis
Console, Note 4). Regardless, due to hybridization-related issues
with arrays and other previously discussed technical features,
RNA-Seq is likely to generate greater fold changes and exhibit
increased DE sensitivity for the most lowly expressed genes and
this is evident here. In sum almost all protein-coding differentially
expressed genes detected by array are also identified by RNA-Seq
and functional annotations are highly concordant. As expected,
however, RNA-Seq detected more protein-coding genes and
more DE genes than expression array. Despite the lower number
of protein-coding genes detected with array, the number of
biological process GO terms identified with each platform were
similar, and approximately 40% of the GO terms measured at
p < 0.05 in RNA-Seq were also seen with array while 74% of
significant GO terms with array were seen in RNA-Seq. The higher
levels of non-mRNA transcripts detected with the array platform
were not explored in this study because of the experimental focus
on polyAþ RNAs. The expression datasets were produced in our
RNA-Seq and Expression Arrays: Selection Guidelines for Genome-Wide. . . 25
core laboratories at similar costs for each platform, however, there
were additional informatics costs for the RNA-Seq data analysis
compared with the array analysis.
4 Guidelines for Expression Assay Selection
Microarrays are an established and reliable platform for gene
expression profiling; studies using arrays have produced accurate
and reproducible expression data for over two decades. More
recently, RNA-Seq has offered a powerful technology capable of
measuring transcript abundance by direct sequencing of amplified
cDNA and producing sensitive transcript detection and highly
accurate DE fold change measurements. As shown through our
example study, both RNA-Seq and expression array assay produce
informative, concordant, and useful results with an identical sample
source. For many studies, expression profiling performed with
either arrays or RNA-Seq will provide accurate and sufficient infor-
mation about differentially expressed genes and regulated pathways
to guide biological understanding and decision-making for future
experiments. For a hypothesis-testing or initial exploratory study or
for verification of expression differences, arrays offer a convenient
and affordable option with easily accessible and standardized data
analysis tools for self-directed data interrogation. RNA-Seq has
clear advantages to expression array for detection of novel tran-
scripts, isoforms, and sequence variations as well as an expanded
dynamic range for measurement of differential gene expression.
However, the increased sequencing depths required for reliable
detection of low abundance transcripts, alternative isoforms and
rare events come at greater cost. Therefore, when measurement of
rare transcripts is needed, RNA-Seq is the better option, but there
may be higher data generation costs and the data analysis require-
ments must be addressed.
Some general considerations during the platform selection
process should be kept in mind. Array options depend on available
genome sequence information and vendor product offerings. While
RNA-Seq can be run on any RNA sample, efficient alignment and
analysis of the data requires a transcriptome or genome framework.
Arrays can offer more flexibility for low sample numbers at reason-
able cost, since some arrays are packaged individually or in small
numbers thus allowing only a few samples to be processed at a time.
This contrasts with some massively parallel sequencing platforms
which require that a flow cell with multiple lanes or a chip be filled
with multiple samples prior to a sequencing run to allow cost-
efficient operation. Arrays also provide cost-efficient options for
working directly with total RNA to analyze both mRNA and
lncRNA on a single array. These and other considerations for
selecting one platform over the other are summarized in Fig. 3.
26 Jessica Minnier et al.
Fig. 3 Performance and use considerations for gene expression profiling with microarrays versus RNA-Seq
Questions that should be addressed prior to platform selection
are the following:
1. What is the experimental system (species, tissue type, likely
RNA quality and amount)?
2. What are the expression study goals?
3. What resources and expertise are available for data manage-
ment and analysis (see Notes 4 and 7)?
4. What financial resources are available for data generation and
data analysis?
5. What is the timeline for study completion?
With answers to the above questions in hand, the platform
selection process for a gene expression study may proceed as out-
lined in Fig. 4. Optimally, the decision process will include platform
specialists and any data analysts or statisticians involved in the study.
Consider three scenarios. (1) You are interested in studying well-
annotated genes or identifying regulation of characterized path-
ways in a human study or widely used model system. In this case
arrays offer an accessible and analytically standardized platform at
relatively low cost that will meet study goals. (2) You are looking for
novel splice variants, rare transcripts, or allele-specific expression in
your study samples. In this case RNA-Seq would be the appropriate
RNA-Seq and Expression Arrays: Selection Guidelines for Genome-Wide. . . 27
Fig. 4 Technology platform selection process for gene expression profiling
28 Jessica Minnier et al.
choice, although there may be additional cost for the sequencing
coverage required to identify rare events. (3) You wish to identify
gene expression changes that are driving changes in an experimen-
tal system and need to determine treatment conditions, time
points, etc., at which relevant transcriptional activity is occurring
prior to larger study. A preliminary analysis with an inexpensive
array assay can provide information on changes in known genes or
predicted pathways that can be used to select conditions for an
in-depth RNA-Seq study. Ultimately, the platform selection deci-
sion should be based on the resources and qualified expertise
available to you and the depth and breadth of information you
require. Practical considerations such as cost and turnaround
times at your local core or other genomic technology resource
will also factor into your decision.
5 Notes
1. ThermoFisher Scientific recently acquired Affymetrix, Inc., the
developer of the GeneChip™ expression array. Affymetrix pro-
ducts are now marketed through ThermoFisher (www.
thermofisher.com).
2. For successful data generation and useful and interpretable
results, we strongly recommend developing study design with
guidance of an experienced statistician or other data analysis
expert. Any experimental plan should account for both techni-
cal variation and batch effects in the sample processing and
assay steps. Power analyses should be performed to determine
the sample size needed for the study, but will only be as
accurate as the assumptions made. Accuracy of power analyses
can be improved by examining the distribution of data and
group differences of interest from pilot studies or similar data
sets. For out-bred populations the sample sizes needed to
detect moderate or small differences can be prohibitive. Studies
with complex designs require statistical expertise at both the
design and analysis stages to ensure hypotheses of interest can
be analyzed correctly with the data generated.
3. When comparing results from RNA-Seq and microarray, care
needs to be taken that the data for each platform is annotated
using comparable units of the transcriptome, based on gene
and transcript definitions [27, 55]. This could mean defining
genes as a subset of the full gene probeset from the array and a
subset of exons measured in RNA-Seq in order to create over-
lapping transcript sets measured by both platforms. Legacy
data from older array platforms or RNA-Seq data aligned to
older reference genomes and/or annotated with outdated
RNA-Seq and Expression Arrays: Selection Guidelines for Genome-Wide. . . 29
transcriptome information will likely need to be realigned and
annotated before platform comparison and integration.
4. Commercial software options: Partek® provides analysis suites
for both array and RNA-Seq (Genomics Suite®, Flow®; http://
www.partek.com/). ThermoFisher provides Transcriptome
Analysis Console (TAC) for GeneChip array data for free
download (https://www.thermofisher.com). GeneSpring GX
is available from Agilent Technologies (http://www.geno
mics.agilent.com/) and CLC Genomics Workbench for
RNA-Seq is available through Qiagen (https://www.
qiagenbioinformatics.com/products/); there are several
others at various price points [56]. Most of the same algorithms
and methods implemented by these products are also available
in various packages in the free and open source software Bio-
conductor (http://bioconductor.org/) which uses the statisti-
cal programming language R. For RNA-Seq analysis, storage
and processing capabilities of computing hardware also need to
be considered.
5. For additional integration discussion see [48] and references
therein. Expression array and RNA-Seq results can be com-
bined at the ratio level, such that fold changes of genes between
groups of interest can be calculated with data from both plat-
forms and sets of significant genes from each platform can be
combined to test for enriched pathways. Additionally,
RNA-Seq can be used as a discovery tool to identify genes
and transcripts of interest in the experiment and array probesets
can then be selected to cheaply and rapidly quantify the expres-
sion of those genes in a large cohort [42]. Abundance results in
RNA-Seq can be used to filter out background noise expression
in array by removing genes from downstream analyses that are
not present in an RNA-Seq experiment on the same samples,
thereby increasing power of the analysis of the array data [57].
6. The majority of RNA measured in our RNA-Seq experiment
should be from protein-coding genes due to polyA selection
prior to library preparation. In the 45k genes with non-zero
read counts in our data, only 22k were protein-coding
(Table 2). Most of the expression measurements from other
types of RNA (antisense, miRNA, lncRNA, pseudogenes, and
small noncoding RNA) were removed after filtering based on
low read counts. This suggests that low count measurements
could be due to spurious alignment errors or library contami-
nation and this illustrates a common challenge of RNA-Seq
data processing and analysis.
7. Both expression array and RNA-Seq generate large amounts of
data. (RNA-Seq can require close to 100 the data storage
needs that microarray requires. As an example, for our
30 Jessica Minnier et al.
illustration data set of four samples, the MTA array data—.cel
files and annotation files, not including raw .dat files—were
~1 Gb, while RNA-Seq data—fastq, alignment databases, and
other processing files—were ~100 Gb). These large data sets
can present challenges both for efficient analysis and data stor-
age. In addition batch effects and experimental design issues
can influence results in nonobvious ways. If you are doing
analysis yourself, multiple commercial options (see Note 4)
are available, but multiple options need to be selected and
default options may not always be best.
Acknowledgments
The authors thank Julja Burchard for enthusiastic and thoughtful
discussions on the design and content of the chapter. We thank
Dr. Robert Searles for manuscript review and expert advice on
RNA-Seq methods. We thank Caitlin Harrington-Smith for crea-
tive assistance with figures and Amy Carlos and Kristina Vartanian
for excellent technical assistance. This work was supported in part
by the OHSU Knight Cancer Institute (NIH NCI Cancer Center
Support Grant P30 CA069533-17) and NIH/NCIR01CA169175
(to P. Schedin).
References
1. Schena M, Shalon D, Davis RW, Brown PO Benoit VA, Benson KF, Bevis C, Black PJ,
(1995) Quantitative monitoring of gene Boodhun A, Brennan JS, Bridgham JA,
expression patterns with a complementary Brown RC, Brown AA, Buermann DH,
DNA microarray. Science 270(5235):467–470 Bundu AA, Burrows JC, Carter NP,
2. Lockhart DJ, Dong H, Byrne MC, Follettie Castillo N, Chiara ECM, Chang S, Neil
MT, Gallo MV, Chee MS, Mittmann M, Cooley R, Crake NR, Dada OO, Diakoumakos
Wang C, Kobayashi M, Horton H, Brown EL KD, Dominguez-Fernandez B, Earnshaw DJ,
(1996) Expression monitoring by hybridiza- Egbujor UC, Elmore DW, Etchin SS, Ewan
tion to high-density oligonucleotide arrays. MR, Fedurco M, Fraser LJ, Fuentes Fajardo
Nat Biotechnol 14(13):1675–1680. https:// KV, Scott Furey W, George D, Gietzen KJ,
doi.org/10.1038/nbt1296-1675 Goddard CP, Golda GS, Granieri PA, Green
3. Bentley DR, Balasubramanian S, Swerdlow DE, Gustafson DL, Hansen NF, Harnish K,
HP, Smith GP, Milton J, Brown CG, Hall KP, Haudenschild CD, Heyer NI, Hims MM, Ho
Evers DJ, Barnes CL, Bignell HR, Boutell JM, JT, Horgan AM, Hoschler K, Hurwitz S, Iva-
Bryant J, Carter RJ, Keira Cheetham R, Cox nov DV, Johnson MQ, James T, Huw Jones
AJ, Ellis DJ, Flatbush MR, Gormley NA, TA, Kang GD, Kerelska TH, Kersey AD,
Humphray SJ, Irving LJ, Karbelashvili MS, Khrebtukova I, Kindwall AP, Kingsbury Z,
Kirk SM, Li H, Liu X, Maisinger KS, Murray Kokko-Gonzales PI, Kumar A, Laurent MA,
LJ, Obradovic B, Ost T, Parkinson ML, Pratt Lawley CT, Lee SE, Lee X, Liao AK, Loch JA,
MR, Rasolonjatovo IM, Reed MT, Rigatti R, Lok M, Luo S, Mammen RM, Martin JW,
Rodighiero C, Ross MT, Sabot A, Sankar SV, McCauley PG, McNitt P, Mehta P, Moon
Scally A, Schroth GP, Smith ME, Smith VP, KW, Mullens JW, Newington T, Ning Z, Ling
Spiridou A, Torrance PE, Tzonev SS, Vermaas Ng B, Novo SM, O’Neill MJ, Osborne MA,
EH, Walter K, Wu X, Zhang L, Alam MD, Osnowski A, Ostadan O, Paraschos LL,
Anastasi C, Aniebo IC, Bailey DM, Bancarz Pickering L, Pike AC, Pike AC, Chris
IR, Banerjee S, Barbour SG, Baybayan PA, Pinkard D, Pliskin DP, Podhasky J, Quijano
VJ, Raczy C, Rae VH, Rawlings SR, Chiva
RNA-Seq and Expression Arrays: Selection Guidelines for Genome-Wide. . . 31
Rodriguez A, Roe PM, Rogers J, Rogert Baci- 11. Wheelan SJ, Martinez Murillo F, Boeke JD
galupo MC, Romanov N, Romieu A, Roth RK, (2008) The incredible shrinking world of
Rourke NJ, Ruediger ST, Rusman E, Sanches- DNA microarrays. Mol Biosyst 4(7):726–732.
Kuiper RM, Schenker MR, Seoane JM, Shaw https://doi.org/10.1039/b706237k
RJ, Shiver MK, Short SW, Sizto NL, Sluis JP, 12. Schulze A, Downward J (2001) Navigating
Smith MA, Ernest Sohna Sohna J, Spence EJ, gene expression using microarrays—a technol-
Stevens K, Sutton N, Szajkowski L, Tregidgo ogy review. Nat Cell Biol 3(8):E190–E195.
CL, Turcatti G, Vandevondele S, Verhovsky Y, https://doi.org/10.1038/35087138
Virk SM, Wakelin S, Walcott GC, Wang J, Wors- 13. Goodwin S, McPherson JD, McCombie WR
ley GJ, Yan J, Yau L, Zuerlein M, Rogers J, (2016) Coming of age: ten years of next-
Mullikin JC, Hurles ME, McCooke NJ, West generation sequencing technologies. Nat Rev
JS, Oaks FL, Lundberg PL, Klenerman D, Genet 17(6):333–351. https://doi.org/10.
Durbin R, Smith AJ (2008) Accurate whole 1038/nrg.2016.49
human genome sequencing using reversible ter-
minator chemistry. Nature 456(7218):53–59. 14. Moorthie S, Mattocks CJ, Wright CF (2011)
https://doi.org/10.1038/nature07517 Review of massively parallel DNA sequencing
technologies. HUGO J 5(1–4):1–12. https://
4. Bentley DR (2006) Whole-genome re-sequen- doi.org/10.1007/s11568-011-9156-3
cing. Curr Opin Genet Dev 16(6):545–552.
https://doi.org/10.1016/j.gde.2006.10.009 15. Hrdlickova R, Toloue M, Tian B (2017)
RNA-Seq methods for transcriptome analysis.
5. Wheeler DA, Srinivasan M, Egholm M, Shen Y, Wiley Interdiscip Rev RNA 8(1). https://doi.
Chen L, McGuire A, He W, Chen YJ, org/10.1002/wrna.1364
Makhijani V, Roth GT, Gomes X, Tartaro K,
Niazi F, Turcotte CL, Irzyk GP, Lupski JR, 16. Chu Y, Corey DR (2012) RNA sequencing:
Chinault C, Song XZ, Liu Y, Yuan Y, platform selection, experimental design, and
Nazareth L, Qin X, Muzny DM, data interpretation. Nucleic Acids Ther 22
Margulies M, Weinstock GM, Gibbs RA, Roth- (4):271–274. https://doi.org/10.1089/nat.
berg JM (2008) The complete genome of an 2012.0367
individual by massively parallel DNA sequenc- 17. Oshlack A, Robinson MD, Young MD (2010)
ing. Nature 452(7189):872–876. https://doi. From RNA-seq reads to differential expression
org/10.1038/nature06884 results. Genome Biol 11(12):220. https://doi.
6. Mortazavi A, Williams BA, McCue K, org/10.1186/gb-2010-11-12-220
Schaeffer L, Wold B (2008) Mapping and 18. Fasold M, Binder H (2014) Variation of RNA
quantifying mammalian transcriptomes by quality and quantity are major sources of batch
RNA-Seq. Nat Methods 5(7):621–628. effects in microarray expression data. Microar-
https://doi.org/10.1038/nmeth.1226 rays (Basel) 3(4):322–339. https://doi.org/
7. Nagalakshmi U, Wang Z, Waern K, Shou C, 10.3390/microarrays3040322
Raha D, Gerstein M, Snyder M (2008) The 19. Schuierer S, Carbone W, Knehr J, Petitjean V,
transcriptional landscape of the yeast genome Fernandez A, Sultan M, Roma G (2017) A
defined by RNA sequencing. Science 320 comprehensive assessment of RNA-seq proto-
(5881):1344–1349. https://doi.org/10. cols for degraded and low-quantity samples.
1126/science.1158441 BMC Genomics 18(1):442. https://doi.org/
8. Wilhelm BT, Marguerat S, Watt S, Schubert F, 10.1186/s12864-017-3827-y
Wood V, Goodhead I, Penkett CJ, Rogers J, 20. Schroeder A, Mueller O, Stocker S,
Bahler J (2008) Dynamic repertoire of a Salowsky R, Leiber M, Gassmann M,
eukaryotic transcriptome surveyed at single- Lightfoot S, Menzel W, Granzow M, Ragg T
nucleotide resolution. Nature 453 (2006) The RIN: an RNA integrity number for
(7199):1239–1243. https://doi.org/10. assigning integrity values to RNA measure-
1038/nature07002 ments. BMC Mol Biol 7:3. https://doi.org/
9. Lockhart DJ, Winzeler EA (2000) Genomics, 10.1186/1471-2199-7-3
gene expression and DNA arrays. Nature 405 21. Shanker S, Paulson A, Edenberg HJ, Peak A,
(6788):827–836. https://doi.org/10.1038/ Perera A, Alekseyev YO, Beckloff N, Bivens NJ,
35015701 Donnelly R, Gillaspy AF, Grove D, Gu W,
10. Bumgarner R (2013) Overview of DNA micro- Jafari N, Kerley-Hamilton JS, Lyons RH,
arrays: types, applications, and their future. Tepper C, Nicolet CM (2015) Evaluation of
Curr Protoc Mol Biol. Chapter 22:Unit commercially available RNA amplification kits
22.21. https://doi.org/10.1002/ for RNA sequencing using very low input
0471142727.mb2201s101 amounts of total RNA. J Biomol Tech 26
32 Jessica Minnier et al.
(1):4–18. https://doi.org/10.7171/jbt.15- based on variance and bias. Bioinformatics 19
2601-001 (2):185–193
22. Leek JT, Scharpf RB, Bravo HC, Simcha D, 32. Engstrom PG, Steijger T, Sipos B, Grant GR,
Langmead B, Johnson WE, Geman D, Kahles A, Ratsch G, Goldman N, Hubbard TJ,
Baggerly K, Irizarry RA (2010) Tackling the Harrow J, Guigo R, Bertone P, Consortium R
widespread and critical impact of batch effects (2013) Systematic evaluation of spliced align-
in high-throughput data. Nat Rev Genet 11 ment programs for RNA-seq data. Nat Meth-
(10):733–739. https://doi.org/10.1038/ ods 10(12):1185–1191. https://doi.org/10.
nrg2825 1038/nmeth.2722
23. van Dijk EL, Jaszczyszyn Y, Thermes C (2014) 33. Zhao S, Zhang B (2015) A comprehensive
Library preparation methods for next- evaluation of ensembl, RefSeq, and UCSC
generation sequencing: tone down the bias. annotations in the context of RNA-seq read
Exp Cell Res 322(1):12–20. https://doi.org/ mapping and gene quantification. BMC Geno-
10.1016/j.yexcr.2014.01.008 mics 16:97. https://doi.org/10.1186/
24. Auer PL, Doerge RW (2010) Statistical design s12864-015-1308-8
and analysis of RNA sequencing data. Genetics 34. Dillies MA, Rau A, Aubert J, Hennequet-
185(2):405–416. https://doi.org/10.1534/ Antier C, Jeanmougin M, Servant N,
genetics.110.114983 Keime C, Marot G, Castel D, Estelle J,
25. Yang H, Harrington CA, Vartanian K, Coldren Guernec G, Jagla B, Jouneau L, Laloe D, Le
CD, Hall R, Churchill GA (2008) Randomiza- Gall C, Schaeffer B, Le Crom S, Guedj M,
tion in laboratory procedure is key to obtaining Jaffrezic F, French StatOmique C (2013) A
reproducible microarray results. PLoS One 3 comprehensive evaluation of normalization
(11):e3724. https://doi.org/10.1371/jour methods for Illumina high-throughput RNA
nal.pone.0003724 sequencing data analysis. Brief Bioinform 14
26. Zhao S, Fung-Leung WP, Bittner A, Ngo K, (6):671–683. https://doi.org/10.1093/bib/
Liu X (2014) Comparison of RNA-Seq and bbs046
microarray in transcriptome profiling of acti- 35. Anders S, Huber W (2010) Differential expres-
vated T cells. PLoS One 9(1):e78644. sion analysis for sequence count data. Genome
https://doi.org/10.1371/journal.pone. Biol 11(10):R106. https://doi.org/10.1186/
0078644 gb-2010-11-10-r106
27. Yu J, Cliften PF, Juehne TI, Sinnwell TM, 36. Law CW, Chen Y, Shi W, Smyth GK (2014)
Sawyer CS, Sharma M, Lutz A, Tycksen E, voom: precision weights unlock linear model
Johnson MR, Minton MR, Klotz ET, Schriefer analysis tools for RNA-seq read counts.
AE, Yang W, Heinz ME, Crosby SD, Head RD Genome Biol 15(2):R29. https://doi.org/10.
(2015) Multi-platform assessment of transcrip- 1186/gb-2014-15-2-r29
tional profiling technologies utilizing a precise 37. Storey JD (2002) A direct approach to false
probe mapping methodology. BMC Genomics discovery rates. J R Stat Soc Series B Stat Meth-
16:710. https://doi.org/10.1186/s12864- odol 64(3):479–498
015-1913-6 38. Dudoit S, Gentleman RC, Quackenbush J
28. Sims D, Sudbery I, Ilott NE, Heger A, Ponting (2003) Open source software for the analysis
CP (2014) Sequencing depth and coverage: of microarray data. Biotechniques Suppl:45–51
key considerations in genomic analyses. Nat 39. Conesa A, Madrigal P, Tarazona S, Gomez-
Rev Genet 15(2):121–132. https://doi.org/ Cabrero D, Cervera A, McPherson A, Szczesniak
10.1038/nrg3642 MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A
29. Tarazona S, Garcia-Alcalde F, Dopazo J, (2016) A survey of best practices for RNA-seq
Ferrer A, Conesa A (2011) Differential expres- data analysis. Genome Biol 17:13. https://doi.
sion in RNA-seq: a matter of depth. Genome org/10.1186/s13059-016-0881-8
Res 21(12):2213–2223. https://doi.org/10. 40. Auer PL, Srivastava S, Doerge RW (2012) Dif-
1101/gr.124321.111 ferential expression—the next generation and
30. Park T, Yi SG, Kang SH, Lee S, Lee YS, Simon beyond. Brief Funct Genomics 11(1):57–62.
R (2003) Evaluation of normalization methods https://doi.org/10.1093/bfgp/elr041
for microarray data. BMC Bioinformatics 4:33. 41. Marioni JC, Mason CE, Mane SM,
https://doi.org/10.1186/1471-2105-4-33 Stephens M, Gilad Y (2008) RNA-seq: an
31. Bolstad BM, Irizarry RA, Astrand M, Speed TP assessment of technical reproducibility and
(2003) A comparison of normalization meth- comparison with gene expression arrays.
ods for high density oligonucleotide array data Genome Res 18(9):1509–1517. https://doi.
org/10.1101/gr.079558.108
RNA-Seq and Expression Arrays: Selection Guidelines for Genome-Wide. . . 33
42. Raghavachari N, Barb J, Yang Y, Liu P, 49. Guo Q, Minnier J, Burchard J, Chiotti K,
Woodhouse K, Levy D, O’Donnell CJ, Mun- Spellman P, Schedin P (2017) Physiologically
son PJ, Kato GJ (2012) A systematic compari- activated mammary fibroblasts promote post-
son and evaluation of high density exon arrays partum mammary cancer. JCI Insight 2(6):
and RNA-seq technology used to unravel the e89206. https://doi.org/10.1172/jci.insight.
peripheral blood transcriptome of sickle cell 89206
disease. BMC Med Genet 5:28. https://doi. 50. Dobin A, Davis CA, Schlesinger F, Drenkow J,
org/10.1186/1755-8794-5-28 Zaleski C, Jha S, Batut P, Chaisson M, Gingeras
43. Bottomly D, Walter NA, Hunter JE, TR (2013) STAR: ultrafast universal RNA-seq
Darakjian P, Kawane S, Buck KJ, Searles RP, aligner. Bioinformatics 29(1):15–21. https://
Mooney M, McWeeney SK, Hitzemann R doi.org/10.1093/bioinformatics/bts635
(2011) Evaluating gene expression in 51. McCarthy DJ, Chen Y, Smyth GK (2012) Dif-
C57BL/6J and DBA/2J mouse striatum ferential expression analysis of multifactor
using RNA-Seq and microarrays. PLoS One 6 RNA-Seq experiments with respect to
(3):e17820. https://doi.org/10.1371/jour biological variation. Nucleic Acids Res 40
nal.pone.0017820 (10):4288–4297. https://doi.org/10.1093/
44. Zwemer LM, Hui L, Wick HC, Bianchi DW nar/gks042
(2014) RNA-Seq and expression microarray 52. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW,
highlight different aspects of the fetal amniotic Shi W, Smyth GK (2015) Limma powers dif-
fluid transcriptome. Prenat Diagn 34 ferential expression analyses for
(10):1006–1014. https://doi.org/10.1002/ RNA-sequencing and microarray studies.
pd.4417 Nucleic Acids Res 43(7):e47. https://doi.
45. Consortium SM-I (2014) A comprehensive org/10.1093/nar/gkv007
assessment of RNA-seq accuracy, reproducibil- 53. Young MD, Wakefield MJ, Smyth GK, Oshlack
ity and information content by the Sequencing A (2010) Gene ontology analysis for RNA-seq:
Quality Control Consortium. Nat Biotechnol accounting for selection bias. Genome Biol 11
32(9):903–914. https://doi.org/10.1038/ (2):R14. https://doi.org/10.1186/gb-2010-
nbt.2957 11-2-r14
46. Nazarov PV, Muller A, Kaoma T, Nicot N, 54. Carvalho BS, Irizarry RA (2010) A framework
Maximo C, Birembaut P, Tran NL, for oligonucleotide microarray preprocessing.
Dittmar G, Vallar L (2017) RNA sequencing Bioinformatics 26(19):2363–2367. https://
and transcriptome arrays analyses show oppos- doi.org/10.1093/bioinformatics/btq431
ing results for alternative splicing in patient 55. Chavan SS, Bauer MA, Peterson EA, Heuck
derived samples. BMC Genomics 18(1):443. CJ, Johann DJ Jr (2013) Towards the integra-
https://doi.org/10.1186/s12864-017-3819-y tion, annotation and association of historical
47. Su Z, Fang H, Hong H, Shi L, Zhang W, microarray experiments with RNA-seq. BMC
Zhang W, Zhang Y, Dong Z, Lancashire LJ, Bioinformatics 14(Suppl 14):S4. https://doi.
Bessarabova M, Yang X, Ning B, Gong B, org/10.1186/1471-2105-14-S14-S4
Meehan J, Xu J, Ge W, Perkins R, Fischer M, 56. Mehta JP, Rani S (2011) Software and tools for
Tong W (2014) An investigation of biomarkers microarray data analysis. Methods Mol Biol
derived from legacy microarray data for their 784:41–53. https://doi.org/10.1007/978-1-
utility in the RNA-seq era. Genome Biol 15 61779-289-2_4
(12):523. https://doi.org/10.1186/s13059-
014-0523-y 57. Miller JA, Menon V, Goldy J, Kaykas A, Lee
CK, Smith KA, Shen EH, Phillips JW, Lein ES,
48. Mooney M, McWeeney S (2014) Data integra- Hawrylycz MJ (2014) Improving reliability and
tion and reproducibility for high-throughput absolute quantification of human brain micro-
transcriptomics. Int Rev Neurobiol array data by filtering and scaling probes using
116:55–71. https://doi.org/10.1016/B978- RNA-Seq. BMC Genomics 15:154. https://
0-12-801105-8.00003-5 doi.org/10.1186/1471-2164-15-154
Chapter 3
A Guide for Designing and Analyzing RNA-Seq Data
Aniruddha Chatterjee, Antonio Ahn, Euan J. Rodger,
Peter A. Stockwell, and Michael R. Eccles
Abstract
The identity of a cell or an organism is at least in part defined by its gene expression and therefore analyzing
gene expression remains one of the most frequently performed experimental techniques in molecular
biology. The development of the RNA-Sequencing (RNA-Seq) method allows an unprecedented opportu-
nity to analyze expression of protein-coding, noncoding RNA and also de novo transcript assembly of a new
species or organism. However, the planning and design of RNA-Seq experiments has important implica-
tions for addressing the desired biological question and maximizing the value of the data obtained. In
addition, RNA-Seq generates a huge volume of data and accurate analysis of this data involves several
different steps and choices of tools. This can be challenging and overwhelming, especially for bench
scientists. In this chapter, we describe an entire workflow for performing RNA-Seq experiments. We
describe critical aspects of wet lab experiments such as RNA isolation, library preparation and the initial
design of an experiment. Further, we provide a step-by-step description of the bioinformatics workflow for
different steps involved in RNA-Seq data analysis. This includes power calculations, setting up a computa-
tional environment, acquisition and processing of publicly available data if desired, quality control mea-
sures, preprocessing steps for the raw data, differential expression analysis, and data visualization. We
particularly mention important considerations for each step to provide a guide for designing and analyzing
RNA-Seq data.
Key words RNA-Seq, Genome, Gene expression, Differential expression, Transcript, Sequencing,
Sequenced read
1 Introduction
An organism comes into being by modulating its gene expression.
From the DNA strand, the transcription of genes or a subset of
genes into complementary, single stranded RNA molecules defines
the biological activity and phenotype of a particular cell or cell type.
At a given time, the total amount of the synthesized RNA is
referred to as the transcriptome [1]. A change in the transcriptome
is likely to have functional consequences and therefore the study of
gene expression is crucial to understand altered phenotypes and
Nalini Raghavachari and Natàlia Garcia-Reyero (eds.), Gene Expression Analysis: Methods and Protocols,
Methods in Molecular Biology, vol. 1783, https://doi.org/10.1007/978-1-4939-7834-2_3,
© Springer Science+Business Media, LLC, part of Springer Nature 2018
35
36 Aniruddha Chatterjee et al.
properties of a cell in disease and development. As a result, profiling
gene expression remains the most commonly performed experi-
ment in molecular biology.
From the early days of expression analysis, quantitative poly-
merase chain reaction (qPCR) was used as the gold standard
method for quantifying transcripts [2]. Although qPCR is still
widely used, it is a low throughput method and is used for valida-
tion and analyzing a small number of genes. In the last two decades,
constant efforts were made to improve the throughput and to
profile gene expression at a genome scale. The use of
hybridization-based microarray technologies transformed the
scale in which transcriptomics studies were performed [3]. Micro-
arrays provided an opportunity to analyze thousands of transcripts
accurately at low cost and dominated the area of genome-wide
expression studies. However, microarray platforms have several
limitations [4, 5]. To design probes for interrogation, microarray
requires prior knowledge of the genome under investigation.
Cross-hybridization of the repetitive probes and artifacts induced
due to this remain a long-term problem with microarray platforms.
In addition, the opportunity for analysis of alternative transcripts
and noncoding RNAs is limited in microarray technique.
The advent of high-throughput next-generation sequencing
(NGS) has revolutionized the scale in which genomes or transcrip-
tomes can be analyzed [6]. NGS methods have enabled direct
sequencing of all the expressed transcripts in a cell at a high cover-
age (RNA-Sequencing or RNA-Seq) [7]. As RNA-Seq provides the
opportunity for direct and unbiased sequencing of expressed geno-
mic loci in a cell, it has many additional applications (apart from
detecting changes in mRNA expression) that were not possible to
perform using standard microarray or previous methodologies.
These include the identification of spliced isoforms, novel tran-
scripts, allele-specific expression, and de novo transcript assembly
of a new species or organism. Further, analysis of messenger RNA
(mRNA) remained the most well-studied RNA species for a long
time as the central dogma of molecular biology involves the flow of
information via mRNA (DNA to mRNA to protein) [8, 9]. How-
ever, a large body of work on epigenetics in the last decade, espe-
cially the ENCODE project [10], revealed the importance of other
nongenic transcripts and other RNA species (e.g., microRNA, small
and long noncoding RNAs) in regulating transcription and transla-
tion. RNA-Seq provides an opportunity to profile noncoding RNA
classes at an unprecedented scale [1, 7, 11]. Further, international
consortium such as Cancer Genome Atlas also used RNA-Seq to
create transcriptome data for cancer patients [12].
There are several modifications and designs available for the
RNA-Seq platform and these can determine the type of data
obtained from an RNA-Seq experiment. Therefore, planning and
design of RNA-Seq experiments have important implications in
Guide for Design and Analysing RNA-Sequencing Data 37
addressing the desired biological question and maximizing the
value of the obtained data. In addition, RNA-Seq generates a
huge volume of data and accurate analysis of this data involves
several different steps and choices of tools. This could be challeng-
ing and overwhelming, especially for bench scientists. In this chap-
ter, we describe an entire workflow of performing RNA-Seq
experiments. There are several articles, online documentation,
and protocols that detail some of these steps, and therefore, to
avoid repetition, we have directed the readers to appropriate
sources and focused on aspects that are less covered and are crucial
for decision-making. We describe critical steps for the wet lab
experiments such as RNA isolation, library preparation and the
initial design of an experiment. Further, we provide a step-by-step
description of the bioinformatics workflow for different steps
involved in RNA-Seq data analysis. This includes power calcula-
tions, setting up a computational environment, acquisition and
processing of publicly available data if desired, quality control mea-
sures, preprocessing steps for the raw data, differential expression
analysis, and data visualization. We particularly mention important
considerations for each design and analysis step. We believe that this
will serve as a valuable guide and framework for designing and
analyzing RNA-Seq data for different biological questions.
2 Materials
1. RNAlater RNA Stabilization Reagent (Qiagen).
2. RNeasy mini kit (Qiagen).
3. RNeasy FFPE kit (Qiagen).
4. RNeasy MinElute Cleanup Kit (Qiagen).
5. QIAshredder spin columns (Qiagen).
6. Proteinase K (Qiagen).
7. DNase I (ThermoFisher Scientific).
8. Dextran, Mr 450,000-650,000 (Sigma-Aldrich).
9. Ficoll-Paque PLUS (GE Healthcare).
10. Gibco RPMI Medium 1640 (ThermoFisher Scientific).
11. Gibco Trypsin/EDTA solution (ThermoFisher Scientific).
12. Phosphate buffered saline, PBS: 137 mM NaCl, 2.7 mM KCl,
10 mM Na2HPO4, 1.8 mM KH2PO4, pH 7.4.
13. Absolute ethanol, analytical grade.
14. β-Mercaptoethanol (Sigma-Aldrich).
15. Qubit fluorometer (Life Technologies).
16. Nanodrop ND-1000 spectrophotometer.
38 Aniruddha Chatterjee et al.
17. UltraPure DNase/RNase-Free Distilled Water (ThermoFisher
Scientific).
18. TruSeq Total RNA-Seq library prep kit (Illumina).
19. 2100 Bioanalyzer and RNA kit (Agilent Technologies).
20. HiSeq2000/2500 sequencing system (Illumina).
3 Methods
3.1 RNA-Sequencing It is important to consider the source of RNA material and the
Experimental Design quality to be used for the RNA-Seq experiments. We use the
RNeasy kit (Qiagen, Hilden, Germany) according to the manufac-
3.1.1 RNA Source
turer’s instructions to extract total RNA from (a) tissue samples
and RNA Extraction
(fresh; frozen; or formalin-fixed, paraffin-embedded); (b) leuko-
cytes isolated from peripheral blood; or (c) cultured cell lines
(Fig. 1). Precautionary measures are required to avoid contamina-
tion and/or degradation of samples: work areas and equipment
Fig. 1 Workflow for extracting RNA from tissue samples, leukocytes isolated from blood, or cultured cells. RNA
can be successfully extracted from fresh tissue or tissues that have been frozen, stored in RNAlater, or
formalin-fixed, paraffin-embedded (FFPE). RNA can also be successfully extracted from leukocytes isolated
from fresh blood or from cultured cell lines. For each type of analysis, it is recommended to closely adhere to
the protocols supplied with the RNeasy kit, to use a Qubit fluorometer for accurate quantification, and to store
long term at 80 C
Guide for Design and Analysing RNA-Sequencing Data 39
should be thoroughly cleaned regularly with decontamination solu-
tion; use disposable gloves, sterile RNAse-free plasticware, and
aerosol barrier pipette tips. Carry out all procedures at room tem-
perature, unless otherwise specified.
Prior to RNA extraction, tissues can be stabilized with at least
10 volumes of RNAlater (Qiagen) and stored for up to 4 weeks at
4 C. For long-term storage, incubate overnight at 4 C and either
transfer to 20 C or remove the RNAlater reagent and store at
80 C. Before disruption and homogenization in the RNA purifi-
cation procedure, the RNAlater must be removed. Note that RNA-
later is not recommended for isolated cells or whole blood.
1. To extract RNA from fresh, frozen, or RNAlater-treated tissue
samples (maximum ~30 mg, ~27 mm3), immediately grind to a
fine powder under liquid nitrogen using a mortar and pestle.
Transfer the suspension into a liquid nitrogen-cooled tube and
allow evaporation of the liquid nitrogen but without allowing
the sample to thaw. Add 600 μL RLT buffer containing
β-mercaptoethanol and immediately proceed to homogeniza-
tion using a QIAshredder homogenizer (Qiagen).
2. To extract RNA from formalin fixed paraffin embedded (FFPE)
tissue using the RNeasy FFPE kit, freshly cut 5–20 μm sections
(maximum recommended: 4 10 μm thick, each having
<250 mm2 surface area), transfer to a microcentrifuge tube
with 320 μL xylene and vortex. Incubate for 3 min at 56 C and
after cooling to room temperature, add 240 μL PKD buffer
and vortex. Centrifuge for 1 min at 10,000 rpm (9390 g),
add 10 μL proteinase K to the lower phase and incubate for
15 min at 56 C, followed by 15 min at 80 C on a shaking heat
block. Transfer the lower phase to a new microcentrifuge tube,
incubate on ice for 3 min and centrifuge for 15 min at
13,500 rpm (17,115 g). Transfer the supernatant to a new
tube and add 10 DNase booster buffer and 10 μL prepre-
pared DNase I stock and incubate for 15 min at room temper-
ature. Add 500 μL RBC buffer and mix thoroughly. Add
1.2 mL of 100% ethanol, mix thoroughly and immediately
load onto an RNeasy MinElute spin column for RNA
purification.
3. To isolate leukocytes from fresh peripheral blood, add 5%
dextran in RPMI medium to whole blood allowing the red
blood cells (RBCs) to sediment (tube A). Gently layer the
leukocyte-rich upper layer from tube A onto Ficoll-Paque
PLUS (GE Healthcare, Buckinghamshire, England) in tube B
and centrifuge for 15 min at 2500 rpm (1250 g). After
discarding the plasma layer, transfer the “buffy coat” mononu-
clear cells to tube C and wash with PBS. Discard the Ficoll layer
in tube B and retain the granulocyte pellet. Contaminating
40 Aniruddha Chatterjee et al.
RBCs in tube B are removed by very brief hypotonic lysis with
sterile H2O and washed with PBS. Tubes B and C now contain
purified granulocytes and mononuclear cells, respectively. The
cells can be counted using a hemocytometer. Resuspend cell
pellets (maximum 1 107 cells) in 600 μL RLT buffer contain-
ing β-mercaptoethanol and immediately proceed to homoge-
nization using a QIAshredder homogenizer.
4. For cultured cells in a monolayer, aspirate the medium and
briefly incubate the cells in 0.25% trypsin. After the cells have
detached from the flask, transfer them to a centrifuge tube and
add medium to inactivate the trypsin. Cultured cells grown in
suspension do not require trypsinization. Following centrifu-
gation for 5 min at 300 g, the supernatant is aspirated and the
cells washed with PBS. The cells can be counted using a hemo-
cytometer. Resuspend the cell pellet (maximum 1 107 cells)
in 600 μL RLT buffer containing β-mercaptoethanol and
immediately proceed to homogenization using a QIAshredder
homogenizer.
5. Transfer lysed cell pellets (from one of the protocols 1–4 above)
directly onto a QIAshredder spin column in a sterile 2 mL
collection tube and centrifuge for 2 min at full speed
(>13,000 g). To each lysate, add 600 μL of 70% ethanol
and mix well. Load each sample onto an RNeasy mini column
(100 μg RNA maximum binding capacity) in a 2 mL collection
tube and centrifuge for 30 s (all at 10,000 rpm). Wash mem-
brane with 700 μL of buffer RW1 and centrifuge for 30 s.
Pipette 80 μL of freshly prepared DNase I onto the RNeasy
silica-gel membrane and leave for 15 min. Perform the follow-
ing washes: 350 μL of Buffer RW1 (30 s); 500 μL of Buffer
RPE (30 s); and 500 μL of Buffer RPE (2 min). To dry the
membrane, place the column in a new 2 mL collection tube and
centrifuge for 1 min. Finally, transfer the RNeasy column to a
new 1.5 mL collection tube, pipette 30 μL RNase-free water
directly onto the RNeasy silica-gel membrane and centrifuge
for 1 min to elute RNA. Immediately place the tube of eluted
RNA on ice and quantify using a Qubit 2.0 fluorometer (Life
Technologies, Grand Island, NY). Store at 80 C. We have
demonstrated the utility of this method for obtaining good
quality RNA and data from tissue [13, 14] and FFPE samples
[15, 16].
3.1.2 Quality of the RNA 1. A nanospectrophotometer is often used to estimate the con-
centration (A260 reading of 1 ¼ 44 μg/ml RNA) and purity
(A260/A280 > 1.8) of an RNA preparation. However, this
method is typically inaccurate, especially for small amounts of
RNA, and is not recommended for RNA-Seq library prepara-
tion. Therefore, for quantification we recommend more
Guide for Design and Analysing RNA-Sequencing Data 41
sensitive methods such as Qubit fluorometric quantification or
quantitative RT-PCR.
2. RNA purity is a crucial factor that can affect RNA-Seq library
preparation. Special care must be taken to avoid contamination
with RNases during the purification procedure, which includes
decontamination of work areas and equipment, use of dispos-
able gloves, sterile RNAse-free plasticware and aerosol barrier
pipette tips. Although RNA purification with RNeasy removes
most cellular DNA, it is still possible to have trace amounts
remaining. The extent of DNA contamination can be detected
by running a small amount of the sample on an agarose gel or
by designing a real-time RT-PCR control experiment using a
constitutively expressed gene such as β2 microglobulin (B2M).
To address DNA contamination, further rounds of DNase I
treatment followed by RNeasy purification can be used, but
due to increased processing this is likely to decrease the total
amount of RNA.
3. To determine the integrity and size distribution of the RNA, a
small amount can be run on an Agilent Bioanalyzer. Intact
RNA will have two sharp peaks present that correspond to
28S rRNA and 18S rRNA, which should have an approximate
ratio of 2:1. A more accurate indicator of integrity, however, is
the RNA integrity number (RIN), which uses an algorithm to
incorporate several features of the RNA electropherogram (i.e.,
not just the ratio of 28S:18S ribosomal RNA) [17]. Most
applications suggest a RIN value of >8 for total RNA is desir-
able for RNA-Seq library preparation. For FFPE samples,
where a significant amount of degradation is to be expected, a
RIN value of >8 is not achievable. Furthermore, reports indi-
cate that RIN values from FFPE samples are not a reliable
predictor of successful library preparation [18–20]. To account
for this, the DV200 metric can be used, which is the percentage
of RNA fragments >200 nucleotides. Successful RNA-Seq
library preparation has been reported for a sample with a
DV200 value as low as 30% from 10-year-old FFPE tissue
[20]. In public data repositories (such as GEO or SRA, see
later sections), the RIN value is important to assess the quality
of the dataset. However, this information is often missing from
RNA-Seq datasets. RSeQC’s tin.py script could be used to
determine a measure of mRNA degradation in silico (see
Note 1 for source).
3.1.3 Preparation 1. Single-end or paired end: While designing the sequencing
and Design of RNA-Seq experiment, it is necessary to decide whether single-end
Libraries (SE) or paired end (PE) sequencing will be performed (see
design recommendations in Fig. 2). SE sequencing provides
the cDNA sequence of one end (or in one direction) for each
Random documents with unrelated
content Scribd suggests to you:
The Project Gutenberg eBook of Das Recht der
Hagestolze: Eine Heiratsgeschichte aus dem
Neckartal
This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.
Title: Das Recht der Hagestolze: Eine Heiratsgeschichte aus dem
Neckartal
Author: Julius Wolff
Illustrator: K. Storch
Release date: December 9, 2017 [eBook #56152]
Most recently updated: October 23, 2024
Language: German
Credits: Produced by The Online Distributed Proofreading Team at
http://www.pgdp.net
*** START OF THE PROJECT GUTENBERG EBOOK DAS RECHT DER
HAGESTOLZE: EINE HEIRATSGESCHICHTE AUS DEM NECKARTAL
***
Anmerkungen zur Transkription
Das Original ist in Fraktur gesetzt. Im Original gesperrter bzw. kursiver Text ist so
dargestellt. Im Original in Antiqua gesetzter Text ist so ausgezeichnet.
Weitere Anmerkungen zur Transkription befinden sich am Ende des Buches.
AUSGABE FÜR DIE
DEUTSCHE BUCH-GEMEINSCHAFT G.M.B.H. BERLIN
Julius Wolff
Sämtliche Werke
Vierter Band
Das Recht der Hagestolze
1. Abteilung: Romane (Band 1–3)
Das Recht der Hagestolze
Eine Heiratsgeschichte aus dem
Neckartal
von
Julius Wolff
*
Mit Bildern
von
K. Storch
Paul List Verlag Leipzig
Alle Rechte vorbehalten
Copyright 1912 by Paul List, Leipzig
Druck von Dr. Trenkler & Co. in Leipzig
Das Recht der Hagestolze
Erstes Kapitel.
Es war in der bereits stark vorgeschrittenen Dämmerung eines
warmen Frühlingsabends im Jahre 1397, als ein einsam
daherkommender Mönch über die Neckarbrücke zu Heidelberg auf
das Stadttor zuschritt. Seine hohe Gestalt war von der braunen Kutte
verhüllt und die Kapuze so tief über das gebeugte Haupt gezogen,
daß auch von seinem Gesichte nichts zu sehen war. Er hielt die Arme
dicht an den Leib geschmiegt und die Hände davor gefalten, und
sein Gang hatte etwas Unsicheres, Schwankendes, als wenn er,
dessen ungewohnt auf den Zehen schliche. Ob er auf Sandalen oder
in Schuhen ging, war nicht zu erkennen, denn auch die Füße waren
vom Gewande bedeckt.
Das Tor war noch offen, aber der Torwart trat just aus der
Wachtstube, um es zu schließen, als der Mönch hindurchschritt. Der
Wächter maß den Ankömmling mit einem mißtrauischen Blick und
schien eben eine Frage an ihn richten zu wollen, als der Mönch die
Rechte erhob und das Zeichen des Kreuzes über den andern machte.
Aber die Bewegung fiel etwas ungeschickt, fast ungeschlacht aus;
die Hand holte in den sich schneidenden Richtungen von oben nach
unten und von rechts nach links so hoch und weit aus, als wollte sie
den sündigen Laien statt mit dem heiligen Segenszeichen mit ein
paar derben Schlägen bedenken, die glücklicherweise nicht dessen
Scheitel und Wange trafen, sondern ohne Widerstand zu finden
durch die Luft fuhren. Der Wächter unterdrückte seine Frage, und
auch der Mönch blieb stumm und schritt eilig die Gasse entlang zur
Stadt hinein. Ihm nachblickend schüttelte der Torwart das Haupt und
brummte: »Der ehrwürdige Bruder scheint beim Segenerteilen eine
recht kräftige Faust zu führen. Was mag er bei Nacht hier in der
Stadt zu suchen haben? Denn ein Heidelberger Franziskaner war es
nicht; hätte ihn doch fragen sollen!« Damit hatte der Mann die
schweren Torflügel geschlossen und schob nun die eisernen Riegel
vor. Dann warf er noch einen Blick zum Himmel empor auf die
grauen, schnellziehenden Wolken und begab sich wieder in das
Wachthäuschen.
In der engen, schon ziemlich dunklen Gasse war die Haltung und
Bewegung des Mönches eine ganz andere als vorher beim
Durchschreiten des Tores. Seine Gestalt reckte sich, er trug das
Haupt hoch und gerade, bewegte die Arme frei an der Seite, und
seine Schritte waren fest und weit. Nur wenige Menschen
begegneten ihm, vor denen er es, zumal bei dem spärlichen Lichte,
vielleicht nicht der Mühe wert hielt, klösterliche Demut zur Schau zu
tragen.
Jetzt kam ihm mit lautem Scherzen und Lachen ein Trupp
Studenten entgegen, die paarweise in kleinen Abständen
voneinander gingen. Als der Mönch an dem ersten Paar vorüber
kam, blieb einer der beiden Studenten stehen, wandte sich um und
sagte zu dem andern: »Hast du's gehört, Mutz? der Glatzkopf hatte
einen Schritt, als trüge er Sporen an den Sandalen.«
»Dummes Zeug! Sporen!« erwiderte sein Genosse, »wird wohl der
Teufel gewesen sein, und du hast seinen Pferdehuf trappen hören.«
Lachend gingen sie weiter. Von den zuletzt kommenden
Studenten, die zu vieren in einer Reihe schritten, stieß einer, ein
großer, stämmiger Gesell, hart gegen den Franziskaner und lachte:
»Holla, mi frater! hast du Schultern aus Eichenholz?«
Ein anderer aber fuhr den Mönch heftig an: »Aus dem Wege,
Fledermaus! sonst klatsche ich dich an die Wand, daß du kleben
bleibst!«
Mit fester, rauher Stimme, fast drohend entgegnete der Gestellte:
»Pax vobiscum!« und setzte Raum gebend und weiterschreitend
halblaut hinzu: »Oder ein Kreuzhageldonnerschlag soll euch in die
Kniekehlen fahren!«
»Was will das Murmeltier?« riefen die Studenten; »kommt, laßt
uns ihm die Kutte ausklopfen!«
Einen Augenblick schien es, als wollte der Mönch stehen bleiben
und sich zu den Angreifern umwenden; doch er besann sich und
machte sich eilends davon. Auch die Studenten gingen auf die
besänftigende Aufforderung eines der Ihrigen lachend und spottend
ihres Weges.
»Hätt' ich euch Grünschnäbel nur gleich draußen vor dem Tore!«
knirschte der Verhüllte und ballte die Faust. An dem Kreuzungspunkt
zweier Straßen blieb er unschlüssig stehen, bis eine Frau daher kam,
die er mit seinem mildesten Tone frug: »Könnt Ihr mir nicht sagen,
liebe Frau, wo der ehrsame, hochgelahrte Magister Doktor Christoph
Wiederhold wohnt?«
»Recht gern, ehrwürdiger Vater!« erwiderte die Frau, »Ihr müßt
hier fremd sein, denn den Doktor Wiederhold kennt jedes Kind in
Heidelberg. Geht nur hier rechts die Gasse hinauf, da ist's das fünfte,
sechste, nein, das siebente Haus. Der eiserne Türklopfer ist ein
Hund mit drei Köpfen; Ihr könnt's mit der Hand fühlen, wenn Ihr's
nicht mehr sehen könnt.«
»Dank Euch, liebe Frau!« sprach der Kuttenträger und schritt in
die Gasse hinein, während die Frau noch eine Weile stehen blieb und
ihm verwundert nachschaute, bis er im Dunkel verschwand. Das
Zeichen des Kreuzes über der Dienstwilligen zu machen, hatte der
ehrwürdige Bruder Franziskaner diesmal vergessen.
Bald fand er das gesuchte Haus und setzte den ihm bezeichneten
Türklopfer in laut schallende Bewegung. Eine Magd öffnete und
führte den friedlichen Gottesmann ohne Zögern und Bedenken die
Treppe hinauf zu dem Hausherrn.
Der Herr Magister Doctor juris Christoph Wiederhold saß in seiner
Studierstube an einem Tische, auf dem eine Öllampe brannte, über
Schriften und Pergamente gebeugt und blickte ob des seltsamen
Besuches in so später Stunde verwundert auf. Da mitten in dem
niedrigen Gemache stand, beim matten Dämmerschein der Lampe
fast gespenstisch anzuschauen, eine hohe, ganz vermummte
Gestalt, die ohne ein Wort zu sprechen, zwei funkelnde Augen unter
der Kapuze hervor fest auf ihn gerichtet hielt. Dem kleinen,
schmächtigen Manne ward unheimlich zumut, und zaghaft klang
seine Frage: »Womit kann ich Euch dienen, ehrwürdiger Bruder?«
»Wer Rat und Vertrauen heischt, soll auch Vertrauen
entgegenbringen. Ich bin kein Mönch, wenn ich auch Grund und
Ursach zu dem Wunsche habe, in den Straßen dieser Stadt für nichts
anderes, als einen Mönch gehalten zu werden.« So sprach der
Fremde mit tieftönender Stimme, schlug die Kapuze zurück und
enthüllte dem nun noch mehr erstaunten Gelehrten ein ernstes,
stolzblickendes Antlitz und ein hochgetragenes, bart- und
haarumwalltes Haupt, das mit einer blinkenden Stahlhaube bedeckt
war. »Ich komme, hochachtbarer Herr Magister,« fuhr er dann fort,
»Euch um Euren gelehrten Rat in einer wichtigen
Familienangelegenheit zu ersuchen.«
»Nehmt Platz, edler Herr!« erwiderte der Doktor und deutete auf
einen zweiten Stuhl, der seinem eigenen Sitze gegenüber stand.
»Meinen Namen möcht' ich Euch verschweigen, denn er tut nichts
zur Sache, und es ist fast nur eine Frage, die ich Euch vorzulegen
habe, und um die sich alles dreht, was mir zu wissen not tut,« sagte
der Unbekannte.
Der Doktor nickte und schaute, bequem in seinem hohen Stuhle
sitzend, den Ellenbogen auf der Armlehne und das Kinn zwischen
Daumen und Zeigefinger, mit gespannter Aufmerksamkeit dem
Sprechenden ins Antlitz, der sogleich fortfuhr: »Meine Frage ist
diese: Gibt es ein Recht, ein Gesetz, wonach der Landesfürst einen
Anspruch auf das Erbe, auf die Hinterlassenschaft an Hab und Gut
eines losledigen, unverheirateten Mannes hat?«
»Hier steht es,« sagte er dann, mit dem Finger auf das Blatt zeigend.
»Ihr meint das jus misogamorum, das Recht der Hagestolze, das
man eigentlich das Recht des Fürsten auf das Erbe der Hagestolze
nennen sollte,« erwiderte ohne langes Besinnen der Gelehrte.
»Allerdings, Herr, solch ein Recht gibt es, und ich kann Euch darüber
alle Auskunft erteilen, die Ihr wünschen möget.« Darauf erhob er
sich, kramte in einem Schranke unter seinen Schriften und fand bald
ein Heft heraus, das er vor sich auf den Tisch legte, darin blätternd
und suchend. »Hier steht es,« sagte er dann, mit dem Finger auf
das Blatt zeigend: »misogamus amittit jus et potestatem testandi,
ein Hagestolz verliert sein Erblassungsrecht und muß sein Gut der
Obrigkeit des Ortes, wo er sein domicilium hat, überlassen, vermag
also nicht durch ein testamentum oder anderen letzten Willen seine
Güter weder an seine Blutsfreunde noch an andere Leute zu
verordnen und zu vermachen.«
»Hm! hm!« stieß der Fremde hervor; »läßt sich daran nichts
drehen und wenden?«
Der Doktor schüttelte den Kopf und sprach bald frei aus dem
Gedächtnis, bald ablesend: »Was der defunctus verläßt, nimmt der
fiscus hin, auch wenn ersterer sein Hab und Gut ganz oder teilweise
bei Lebzeiten verkauft und das Geld an sich genommen hat. Denn so
man erfahren kann, daß der bald sterbende Hagestolz in fraudem
fisci von seinen Gütern oder Barschaften anderswohin verschafft,
muß solches hinwiederum ad locum domicilii und wo er gehauset
und verstorben, angeschafft werden. Indessen hat solche confiscatio
gemeiniglich nicht statt in allen Gütern des Hagestolzen, sondern nur
in seinen wohlgewonnenen Gütern, die er selber in seinem Stande,
Nahrung, Getrieb und Arbeit erworben, ersparet und erübrigt hat,
nicht aber in seinen Erb- oder Stammgütern, wie auch nicht in
seinen Lehngütern.«
»Aha! das klingt schon besser!« sagte der Gast.
»Ja, so steht es geschrieben,« erwiderte der Doktor, »aber, edler
Herr, das Recht wird verschieden ausgelegt und gehandhabt, und es
ist hierzulande schon vorgekommen, daß auch das Erbgut eines
verstorbenen Hagestolzen eingezogen worden ist. Sollte es Euch
nebenbei ganz unbekannt sein, daß unser gnädigster Kurfürst,
Pfalzgraf Ruprecht der Zweite, ein sehr starkes Begehren nach
Landbesitz hat und deshalb per fas et nefas –«
»Das weiß ich vielleicht besser, als Ihr, achtbarer Herr Doktor!«
unterbrach ihn der andere mit einem eigentümlichen Lächeln. »Was
ist aber nun Euer Rat, um solche schmähliche confiscatio zu
verhindern?«
»Wenn Euer Freund – oder seid Ihr es selbst?«
»Nein, mein Bruder ist es.«
»Wenn also Euer Bruder das Zeitliche segnet, will sagen, mit Tode
abgeht, so beerbt ihn der fiscus; daran ist nichts zu ändern, edler
Herr. Ist er schwer krank?«
»O bewahre! er steht, Gott sei Dank! auf zwei sehr gesunden
Füßen.«
»Wie alt ist er denn schon?«
»Neunundvierzig Jahr.«
»Neunundvierzig erst? noch nicht fünfzig?« rief der Doktor lebhaft,
»nun, dann ist ja noch nichts verloren und verdorben! Denn wisset,
edler Herr, das Hagestolzenrecht gewinnt erst Kraft und Gültigkeit,
wenn der Erblasser fünfzig Jahr drei Monat und zwei Tage alt ist.«
»Das ist mir auch gesagt worden, aber was hilft's?«
»Euer Bruder muß heiraten!«
»Heiraten!« lachte der Gast, »der und heiraten!«
»Ja, wenn er nach Überschreitung vorgemeldeter Altersgrenze als
lediger Mann stirbt, so ist sein Hab und Gut rettungslos für Euch
verloren. Er kann sich aber auch noch später beweiben, und wenn er
sich dann auch keiner Nachkommenschaft mehr erfreuen sollte, so
beerben ihn doch seine nächsten Blutsverwandten und nicht der
Pfalzgraf.«
»Seid Ihr dessen sicher und gewiß?«
»Ohne allen Zweifel und Irrung!«
Der Fremde stand auf, machte nachdenklich in dem kleinen
Gemach ein paar klirrende Schritte auf und ab, zog dann ein
Ledersäckchen unter der Kutte hervor und legte zwei blanke
Goldgulden auf den Tisch: »Ich sage Euch allen schuldigen Dank,
Herr Doktor Christoph Wiederhold!« sprach er dann, »gehabt Euch
wohl!« Und die Kapuze wieder über den Kopf ziehend, schritt er zur
Tür hinaus, es gern zulassend, daß ihm der Magister mit der Lampe
zur Treppe hinunter leuchtete und selber die Haustür aufschloß.
Mißmutig und so schnell es die fast völlige Dunkelheit erlaubte,
eilte der Vermummte dem Tore zu und klopfte den Wächter heraus.
Dieser kam mit den Schlüsseln aus seinem Stübchen und
beleuchtete den Auslaßfordernden mit einer mattbrennenden
Hornlaterne. »Ach, Ihr seid es, ehrwürdiger Bruder! Nun, habt Ihr
Euer Geschäft in unserer guten Stadt zu Eurer Zufriedenheit zu Ende
gebracht?« frug er in Erwartung eines guten Schließpfennigs mit
unterwürfigem Tone, während er das Schlüsselloch der kleinen
Nachtpforte suchte, die sich für Fußgänger in dem großen Torflügel
öffnen ließ.
»Was schiert dich meine Zufriedenheit?« fuhr ihn der Mönch an,
»ich bin nicht in der Laune, dir Rede zu stehen. Vorwärts! Tür auf!
oder ein Kreuzhageldonnerschlag soll dir –«
»– in die Kniekehlen fahren!« fiel ihm der Wächter lachend ins
Wort, indem er das Pförtchen aufsperrte. »Das Sprüchlein kenn' ich,
Herr Bligger Landschad von Steinach!«
»Woher, du Schuft?«
»Von manchem Fuhrmann hab' ich's gelernt, dem Ihr die Fracht
unterwegs ohne seinen Dank erleichtert habt, Herr Ritter!«
entgegnete der Wächter trotzig.
»Dir geb' ich noch was zu!« sprach der also Gehöhnte, und der
Wächter fühlte einen Faustschlag im Nacken, daß er taumelte,
während der andere durch die Pforte ins Freie entwich.
Kaum aber war der Ritter auf der Brücke, auf die aus den
zerreißenden Wolken etwas helleres Licht fiel, als er hinter sich den
lauten Notruf des Wächters vernahm, den dieser auf seinem Horne
blies. Er beschleunigte seine Schritte und streifte im Gehen die
Mönchskutte ab, sie über den Arm hängend. Im Panzerhemd, das er
trug, konnte er nun freier und rascher ausschreiten und tat dies
auch, die linke Hand am Schwertgriff. Jetzt ließ er einen gellenden
Pfiff auf dem Finger erschallen, worauf aus nicht zu großer
Entfernung derselbe Ton als Antwort klang. Dann näherte sich
schnell doppelter Hufschlag, und bald hielt ein gleichfalls
gepanzerter und bewehrter Reiter vor ihm, der noch ein leeres Pferd
am Zügel führte.
»Nun, wie steht's?« frug der Reiter.
»Er muß heiraten, anders kein Ausweg!« erwiderte Herr Bligger,
während er sich in den Sattel schwang. »Aber jetzt vorwärts! Der
Torwart hat mich erkannt und schlägt Lärm; wir werden sie bald
hinter uns haben, und da kommt schon der Mond hervor.«
Die Reiter gaben ihren Rossen die Sporen und preschten die
Straße stromaufwärts am Neckarufer dahin. –
Der Torwart hatte sich nicht geirrt und den scheinbaren Mönch bei
seinem rechten Namen genannt, der in Heidelberg sehr wohl
bekannt war, aber nicht sonderlich gut angeschrieben stand, was der
Träger dieses Namens auch ganz genau wußte.
Die Herren von Steinach waren ein ritterliches Geschlecht, dessen
Ursprung zwar, wie der so vieler Adelsgeschlechter, in Dunkel gehüllt
war, von dem aber schon Urkunden aus der ersten Hälfte des
zwölften Jahrhunderts reden. Sie genossen eines weit verbreiteten
Ansehens, erfreuten sich eines großen Besitzes und hatten vielfach
Hofämter und hohe Kirchenwürden inne gehabt. Einer der ihrigen,
auch ein Bligger von Steinach, war ein berühmter Minnesänger
gewesen, der im ersten Viertel des dreizehnten Jahrhunderts blühte.
Wahrscheinlich ihm zu Ehren führten die Nachkommen eine
schwarze Harfe in goldenem Felde im Wappenschilde. Der Ruhm
aber, den sich die Enkel erworben hatten, war etwas zweifelhafter
und anrüchiger Natur, denn sie lebten zumeist aus dem Stegreif, und
manches Schiff, das auf dem Neckar, mancher Frachtwagen, der auf
der Landstraße von Heilbronn nach Heidelberg oder in umgekehrter
Richtung fuhr, hatte ihre dreist zugreifende Hand fühlen müssen.
Einer von ihnen, namens Ulrich, hatte das Räuberhandwerk so arg
getrieben, daß ihm das Volk, weil er dem Lande so großen Schaden
zufügte, den Schimpfnamen »Landschaden« beilegte und der Kaiser
die Reichsacht über ihn verhängte. Vogelfrei, wie er nun war, nahm
er an einem Kriegszuge gegen die Ungläubigen teil, schlug einem
gefürchteten Anführer der Sarazenen das Haupt ab und brachte es
dem Kaiser zur Sühne, so daß dieser ihn wieder zu Gnaden aufnahm
und ihm erlaubte, einen gekrönten Sarazenenkopf als Helmzierde im
Wappen zu führen. Den Namen Landschaden aber behielt er und
sein Folgegeschlecht für alle Zeiten bei, und die tapferen Degen
sorgten auch ferner durch ihr Tun und Treiben dafür, daß die
Bedeutung dieses Namens nicht in Vergessenheit geriet.
Zurzeit lebten drei Brüder des Geschlechtes, Bligger, der älteste,
Konrad, der jüngste, beide verheiratet und mit Kindern gesegnet,
und, dem Alter nach in der Mitte zwischen diesen beiden, Hans,
jener Hagestolz, um dessentwillen Bligger sich in die ihm feindlich
gesinnte Stadt hinein gewagt hatte. Diese drei Brüder besaßen vier
Burgen, die nahe beieinander über dem Städtchen Neckarsteinach
auf den Bergen des rechten Flußufers standen. Bligger wohnte in der
größten, der Mittelburg, die mit der kleinen Vorderburg durch eine
Zugbrücke verbunden war; Konrad hauste auf der Hinterburg und
Hans endlich auf Burg Schadeck, vom Volke auch das Schwalbennest
genannt, weil sie hoch, frei und keck über dem Tale wie ein
angeklebtes Nest an einem schroffen Felsen hing.
Dort lebten sie keineswegs einsam und abgeschieden, ohne
ebenbürtige und gleichgesinnte Nachbarn; vielmehr waren innerhalb
der nächsten vier oder fünf Meilen von Neckarsteinach
stromaufwärts die bewaldeten Höhen zu beiden Seiten des
vielgewundenen Tales mit stattlichen und von ritterlichen
Geschlechtern bewohnten Burgen besetzt, wie sie in solcher Zahl auf
so kleinem Raume nirgend anders, auch nicht am Rheine, zu finden
waren. Neckarsteinach gegenüber lag auf hohem Kegel die Veste
Dilsberg, der Sitz des kurpfälzischen Gaugrafen über den Kraich-,
Enz- und Elsenzgau; dann folgten stromauf die Burgen Hirschhorn,
Eberbach, Stolzeneck, Zwingenberg, Minneburg, Dauchstein,
Hornberg, Horneck, Guttenberg und Ehrenberg, eine immer
gewaltiger, als die andere, und jede mit Dörfern und Höfen und
weiten Waldungen als Eigentum versehen oder als erbliches Lehen
bedacht.
Die mächtigsten, reichsten, aber auch gefürchtetsten aller
Burgherren des Neckartales waren die Landschaden von Steinach,
und wenn sich Herr Bligger auch nur bei Nacht und als Mönch
verkleidet in die Stadt Heidelberg hineinschlich, so war das immerhin
schon ein sehr gewagtes Spiel für ihn, denn er hatte eine böse
Rechnung bei ihr auf dem Kerbholz. Darum suchten jetzt die beiden
Brüder möglichst rasch von dannen zu kommen und ritten in
schlankem Trabe durch die vom Monde mehr und mehr erhellte
Nacht heimwärts, ohne miteinander zu reden, ein jeder mit seinen
eigenen Gedanken beschäftigt.
Des verwegenen Ritters heimlicher Besuch in Heidelberg hatte
aber folgende Veranlassung. Am gestrigen Tage hatte in Bliggers
Abwesenheit ein Jude, der sich Isaak Zachäus von Ingolstadt
nannte, in Begleitung seines Sohnes, eines hübschen, dunkeläugigen
Jünglings, auf der Mittelburg vorgesprochen, sich als Arzt für
Menschen und Vieh ausgegeben, gefragt, ob hier etwa die einen
oder das andere seiner Kunst und seines vielerprobten Rates
bedürftig seien, und sich schließlich erboten, den Burgbewohnern
das Horoskop zu stellen, denn er sei auch in Astrologie und höherer
Geometrie wohl bewandert und erfahren. Darauf war die Burgfrau
mit Freuden eingegangen, und der Sterndeuter hatte sie nach Tag
und Stunde der Geburt sämtlicher Familienglieder gefragt, um
danach seine Berechnungen zu machen. Nun besaß Frau Katharina
ein altes Gebetbuch der Mutter ihres Gemahls, in welche diese alle
wichtigen Familienereignisse, also auch die Geburten ihrer Kinder,
eigenhändig verzeichnet hatte. Das holte sie hervor und ging dem
landfahrenden Weisen aus dem Morgenlande mit den nötigen
Zeitangaben zur Hand. Dieser hatte darauf in einsamem Gemache
bei guter Verpflegung den ganzen Tag geschrieben, gerechnet und
allerlei seltsame Figuren gezeichnet, bis er der Burgfrau die
Ergebnisse seiner Nachforschungen mitteilen konnte. Es war aber
nicht viel dabei herausgekommen; lauter günstige oder
nichtssagende Prophezeiungen für die Zukunft aller ihrer
Angehörigen hatte der Hebräer der Burgfrau verkündigt,
Prophezeiungen, nach denen sich weder ein ungewöhnliches Glück
erhoffen, noch ein besonderes Unheil befürchten ließ. Nur über ihren
Schwager Hans hatte er einen seltsam lautenden Ausspruch getan,
denn er behauptete: »Junker Hans wird einmal sein Glück in einem
Kloster finden.« Daraufhin hatte Frau Katharina den Wahrsager
gründlich ausgelacht. Hans, der ritterlichste, lebenslustigste der drei
Brüder, der am liebsten im Sattel oder beim Becher saß oder im
Forste pirschte und von seinem Freunde, dem Abt des
Benediktinerklosters Sinsheim und dessen Konventualen, die er oft
tagelang besuchte, die erbaulichsten und abenteuerlichsten
Geschichten erzählte, der, der gerade sollte selber einmal in ein
Kloster gehen? unmöglich! ganz undenkbar! Aber Isaak Zachäus war
ihren launigen Einwendungen gegenüber kühl und ernst bei seinem
Ausspruch geblieben und hatte hinzugefügt: »Junker Hans ist
neunundvierzig Jahr alt, und wenn er nicht binnen Jahr und Tag
heiratet, so verfällt nach dem Recht der Hagestolze all sein Hab und
Gut als Erbe Eurem gnädigsten Pfalzgrafen.«
Das hatte die Burgfrau stutzig gemacht; sie hatte von einem
solchen Rechte noch niemals gehört, forschte näher danach und ließ
es sich von dem Juden genau erklären. Kurz darauf war Herr Bligger
nach Hause gekommen und war ebenso erstaunt über die unerhörte
Neuigkeit wie seine treffliche Hausfrau.
Was wußten diese unerschrockenen, allezeit derb zufahrenden
Ritter, Junker und Knappen vom Recht und von
Rechtsgewohnheiten. Allenfalls kümmerten sie sich ein wenig um
das Lehnsrecht, im übrigen aber ließen sie nur das Faustrecht gelten
und schlichteten alle Händel mit dem Schwerte. Der Fall, daß einer
ihrer Genossen als Junggeselle gestorben wäre, war im ganzen
Bereiche ihrer Bekanntschaft seit Menschengedenken nicht
vorgekommen, und so hatten sie keine Ahnung von einem
sogenannten Recht der Hagestolze.
Herr Bligger beschloß indessen sofort, der Sache auf den Grund zu
gehen und gleich am nächsten Tage einen Rechtsgelehrten der
jungen Universität zu Heidelberg darüber zu befragen, den
Sterndeuter aber samt dessen Knaben bis zu seiner Rückkehr auf
der Burg festzuhalten. Seiner Hausfrau legte er strenges
Stillschweigen, besonders Bruder Hans gegenüber, auf und weihte
anderen Morgens nur seinen Bruder Konrad ein, in dessen
Begleitung er den Ritt zur feindlichen Stadt unternahm. Nachdem
ihm nun der Magister die Auslassungen des Juden in allen Punkten
bestätigt hatte, ging ihm die Angelegenheit schwer im Kopfe herum.
Er grübelte darüber während des ganzen Rittes, und Konrad wollte
ihn darin jetzt nicht mit unzeitigen Fragen unterbrechen. Erst dicht
vor dem Scheidewege zu ihren Burgen frug Bligger endlich den
Bruder: »Woran hast du unterwegs gedacht, Konrad?«
»Natürlich an nichts anderes,« erwiderte Konrad, »als wie wir das
fertig bringen sollen, Hans zu verheiraten.«
»Das war auch mein einziger Gedanke,« sagte Bligger, »aber ich
komme damit zu keinem Ende. Meine Meinung ist, wir rufen unsere
Freunde zusammen und beratschlagen, ob wir nicht
gemeinschaftlich gegen dieses vermaledeite Hagestolzenrecht etwas
ausrichten können.«
»Und Hans?«
»Hans ist bei seinen Freunden in Sinsheim und wird hoffentlich
noch ein paar Tage ausbleiben; darum leidet die Sache keinen
Aufschub, denn er darf nichts davon merken. Reite du morgen nach
Hirschhorn und Eberbach und lade Otto und Schenk von Erbach zu
einer Zusammenkunft übermorgen bei mir ein; ich werde Ernst mit
demselben Auftrag nach Zwingenberg zu Engelhard und nach
Stolzeneck zu Albrecht von Erlickheim schicken.«
»Gut! ich werde reiten,« sprach Konrad, »aber die Beratung wird
auch zu keinem anderen Schlusse führen, als zu dem, den du von
dem Heidelberger Doktor mitbrachtest: Hans muß heiraten!«
»Ja, aber sage nur wen?!« erwiderte Bligger. »Ich hielt in
Gedanken schon Brautschau für ihn, aber vergeblich. Für unsere
jungen Burgfräulein ringsum ist er zu alt; in Heilbronn oder
Heidelberg darf er sich nicht blicken lassen, und dazu, daß er auf
Werbung im Reiche herumtraben sollte, bringen wir ihn erst recht
nicht. Nur Eine wüßte ich, die er sich nehmen könnte, wenn er
wollte und wenn sie wollte; das wäre –«
»Juliane Rüdt von Kollenberg, die stolze Burgfrau der Minneburg,«
fiel Konrad ein und brach in ein schallendes Gelächter aus, in das
Bligger herzhaft einstimmte.
Sie schüttelten beide den Kopf und schwiegen wieder, bis sie sich
trennen mußten und einander gute Nacht wünschten. Als Bligger
schon ein kleines Stück bergauf seiner Burg zugeritten war, hörte er
von fern noch einmal das laute Lachen seines Bruders Konrad, das
durch die Stille der Nacht zu ihm herüber tönte, und da mußte er
auch wieder lachen, daß es der Bruder hören konnte, und siehe da,
zu gleicher Zeit wieherte sein Roß, weil es sich auf den Stall freute;
aber es klang, als wenn auch der Hengst des Ritters lachen müßte
über den Gedanken, daß Junker Hans einmal Frau Juliane Rüdt von
Kollenberg auf der Minneburg heimführen sollte.
Zweites Kapitel.
Noch in der Nacht nach seiner Rückkehr von Heidelberg machte
Herr Bligger seiner edlen Hausfrau Katharina kurze Mitteilung von
der Unterredung mit dem Rechtsgelehrten und seiner Absicht, einige
befreundete Burgherren aus dem Neckartal zu einer Beratung über
gemeinschaftliche Schritte in der Angelegenheit einzuladen; sie
möchte sich auf eine gute Bewirtung der ritterlichen Gäste für
übermorgen einrichten, im übrigen sich mit ihrer Wißbegierde bis
zum nächsten Tage gedulden und ihn jetzt nichts mehr fragen,
sondern ihn schlafen lassen.
Am anderen Morgen gleich nach dem Frühmahl ritt Konrad und
bald darauf auch Ernst, Bliggers und Katharinas ältester,
dreiundzwanzigjähriger Sohn, nach den benachbarten Burgen ab,
um dort die Einladungen des Familienoberhauptes auszurichten.
Bligger hatte den Sohn nicht eingeweiht, denn in Anbetracht des
sehr innigen und vertrauten Verhältnisses, das zwischen diesem und
Junker Hans bestand, fürchtete er, Ernst möchte seinem
schwärmerisch geliebten Oheim einen Wink geben, infolgedessen
Hans bei seinem eigenwilligen Wesen durch irgendein
unberechenbares Widerspiel Bliggers Pläne kreuzen, vielleicht ganz
vereiteln könnte. Ernst mußte nach dem ihm gewordenen Auftrage
glauben, daß es sich um die Verabredung einer größeren,
gemeinsamen Fehde handelte, die vorläufig noch in tiefes Geheimnis
gehüllt bleiben müßte.
Als die beiden Ehegatten allein waren, nahmen sie das in der
Nacht abgebrochene Gespräch wieder auf, und Frau Katharina
begann: »Also hat uns der wackere Jude mit seinen Mitteilungen
über jenes unerhörte Recht oder Widerrecht doch nicht getäuscht,
und wir können ihn wohl heute seines Weges ziehen lassen mit
seinem Knaben.«
»Nein, noch nicht!« entgegnete Bligger. »Ich habe schon mit ihm
gesprochen, daß er noch hier bleiben soll; denn ich habe so eine
dunkle Ahnung, als könnte er uns in unserer Fürsorge für Hans noch
gute Dienste leisten, wenn ich auch noch nicht weiß, in welcher
Weise.«
»Was sollten das wohl für Dienste sein?« sprach Katharina. »Wenn
dir der Heidelberger Doktor nicht raten und helfen konnte, wird es
der Sterndeuter erst recht nicht können.«
»Wer weiß, Käthe!« antwortete der Ritter. »So ein alter
Schlaufuchs von Hebräer ist mit allen Hunden gehetzt.«
»Gegen den Pfalzgrafen vermag er doch nichts. Oder soll er
vielleicht unserem Bruder Hagestolz sagen, was er bei seinem
Horoskop herausgerechnet hat?«
»Das Horoskop! ich hab's!« rief Bligger, »der Jude muß Julianen
das Horoskop stellen und uns sagen, was er findet!«
»Wem? Julianen? Julianen auf der Minneburg?« frug Katharina
höchst erstaunt.
»Freilich! welcher sonst?« erwiderte Bligger. »Das einfachste und
sicherste Mittel, diesem nichtsnutzigen Hagestolzenrecht zu
entgehen, ist und bleibt, daß Hans heiratet, und nun strenge deinen
Witz an, ob du eine andere findest, die er heiraten könnte, als
Juliane Rüdt von Kollenberg!«
»Ein tollkühner Gedanke, Bligger!« sagte Katharina lachend, »die
Gebieterin der Minneburg sollte sich je dazu verstehen, einem
Landschaden von Steinach, einem ihrer bittersten Feinde, die Hand
zum Ehebunde zu reichen!«
»Aus bitteren Feinden sind schon manchmal die besten Freunde
geworden,« versetzte Bligger. »Und denke doch den Spaß, Käthe,
wenn Hans seine alte Liebe, die er damals nur aus Furcht vor der
Schwiegermutter nicht geheiratet hat, nun doch noch zur Frau
bekäme!«
»O du brauchst gar nicht so weit zurückzugreifen. Es ist vielleicht
drei, höchstens vier Jahre her, daß es mir manchmal scheinen wollte,
als stünde Hans mit der schönen Juliane auf einem viel vertrauteren
Fuße, als ihr seliger, damals noch in gutem Frieden mit euch
lebender Zeisolf wissen und ahnen durfte,« bemerkte die Hausfrau.
»Desto besser!« sagte Bligger, »diese Tatsache, wenn es eine ist,
und von der ich heute zum ersten Male höre, kann mich in meiner
Hoffnung nur bestärken.«
»Was ich andeutete, fiel – ich wiederhole es – in die Zeit vor
eurem Streit,« erwiderte Katharina. »Vergiß nicht, was unterdessen
geschehen ist, und wie unversöhnlich sich Juliane auch nach Zeisolfs
Tode noch uns gegenüber gezeigt hat. Seitdem ist zwischen ihr und
Hans alles vorbei, und sie weiß vielleicht gar nicht, daß er schon vor
ihrer Verheiratung ein Auge auf sie geworfen hatte.«
»Ist auch nicht nötig. Wenn sich die beiden nur jetzt ein wenig
ineinander verlieben oder nach deinen Beobachtungen wieder
ineinander verlieben, so bringen wir sie auch noch glücklich unter
eine Decke und drehen dem Pfalzgrafen eine so lange Nase,« lachte
der Ritter mit einer entsprechenden Handbewegung.
»Wie willst du das anstellen?« frug Katharina. »Hans, der
abgesagte Feind der Ehe, und die kluge, stolze Juliane, – lieber Alter,
du träumst.«
»Pah! sie ist ein Weib, und ich denke, eines mit recht warmem
Blut,« antwortete Bligger. »Glaubst du nicht, daß sie ihr Witwentum
sehr gern wieder mit einem freudenreicheren Dasein vertauschte?«
»Lieber heute, als morgen würde sie das tun, das ist kein
Zweifel,« mußte Katharina zugestehen, »aber eine Frau Landschad
von Steinach wird sie doch nicht, einen so ritterlichen und höchst
annehmbaren Gemahl du ihr auch in Bruder Hans an die Seite geben
könntest.«
»Das will ich meinen!« sagte Bligger, »Hans ist noch ein Mann wie
ein Jüngling trotz seiner neunundvierzig Jahre. Er darf nur nichts
merken; ahnungslos muß er in die Falle gehen, in die wir ihn zu
seinem eigenen Glücke locken, und den Köder, mit dem wir sie beide
fangen, den soll mir der Jude zurecht brauen; er sieht mir ganz
danach aus, als wenn er hexen könnte.«
»Ein Liebestrank?«
»Nein, nein, kein Liebestrank! laß mich nur machen!« sagte
Bligger, rasch im Zimmer auf- und niederschreitend und seiner Frau
wie zur Beschwichtigung mit der Hand winkend, als stiegen ihm
allerlei pfiffige Gedanken auf, in denen sie ihn nicht stören sollte.
»Wo steckt der Jude?« frug er dann; »ich werde mal ein Wort unter
vier Augen mit ihm reden.«
»Ich habe ihm und seinem scheuen Knaben ein behagliches
Zimmer gegeben; in der Giebelstube hausen sie,« erwiderte
Katharina.
»Hast du recht gemacht,« sagte der Ritter, »verpflege sie gut! Ich
wünsche überhaupt, daß ihnen jedermann hier mit mehr Milde und
Freundlichkeit begegnet, als man sonst Juden zu erzeigen gewöhnt
ist. Und nun denke an morgen, daß du mit deiner Küche Ehre
einlegst; du weißt, der Engelhard schlägt auch bei Tische eine
scharfe Klinge.«
»Keine Sorge, mein Alter! sollst zufrieden sein,« rief die Frau
ihrem hinauseilenden Gatten nach. Als sie aber allein war, sprach sie
zu sich: »Also Hans, der Ehehasser, soll heiraten! Es ist eigentlich
recht so und nur zu wünschen, daß es gelingt. Ich kann als Frau das
Hagestolzenrecht nicht ganz verdammen. Des tüchtigen Mannes
Kraft soll das Glück der Liebe schaffen und genießen, statt mit
seinem Herzen mehr und mehr in der Welt zu vereinsamen. Wie sagt
Bliggers Ahnherr, der Minnesänger?
Des Mannes Stärke wäre gut,
Ließ er zu rechten Dingen sie erscheinen,
Allein es ist manch Einem so zu Mut,
Daß er mit Haß sich kränket und die Seinen.«
Katharina war eine stattliche Erscheinung, kräftig und gesund, mit
lebhaften Bewegungen und einem immer noch hübschen, klugen
Gesicht, dessen frische Farben durch das früh sich zeigende
Silbergrau des Haares an Stirn und Schläfen noch mehr
hervorgehoben wurden.
Sie setzte sich an das geöffnete Fenster und blickte sinnend in das
sonnenüberglänzte Tal hinab. Der Wald ringsum an den sanften
Geländen, auf den Hügeln und Bergen trug sein hellgrünes
Frühlingsgewand. Die flinken Wellen des Neckars, dessen Lauf hier
einen großen Bogen beschrieb, blinkten und blitzten im
Morgenlichte; die Schwalben umkreisten die Burg, Finken und
Drosseln schlugen im Gebüsch des steilen Abhanges, und von unten
herauf tönte das sehnsuchtsvolle Lied der Nachtigall. Es war ein
köstlicher Tag, wie zum Ruhen und Träumen und zum wonnigen
Genießen geschaffen. War auch müßiges Träumen Frau Katharinas
Sache sonst nicht, konnte sie sich doch dem Zauber dieses seligen
Friedens, der über dem ganzen, lieblich schönen Talgebilde
ausgebreitet lag, nicht entziehen. Sie atmete mit Entzücken die
würzige Luft und blieb noch lange sitzen, den Blick wie verloren in
die Ferne gerichtet, die Gedanken in vergangene Zeiten versenkt.
Die Herren auf den Burgen des Neckartals hielten im allgemeinen
in guter Eintracht zusammen, besuchten sich gegenseitig mit ihren
Frauen und Kindern, gaben sich fröhliche Feste, Tanzreigen und
Trinkgelage, Ringelrennen und Speerstechen, störten sich auch nicht
in ihrem ritterlichen Gewerbe, halfen sich vielmehr dabei, und war
einmal eine besonders reiche Beute gewonnen, so teilten sie auch
wohl brüderlich untereinander. Kamen auch hin und wieder zwischen
zweien kleine Streitigkeiten vor, bei denen dann leicht recht derbe
Worte fielen, sich wohl gar ein paar Klingen kreuzten, so dauerte
solch ein Zwist in der Regel nicht lange. Die Unbeteiligten bemühten
sich, den Frieden zu vermitteln, und aller Groll wurde mit einem
gründlichen Versöhnungstrunk, bei dem auch die einer Versöhnung
gar nicht Bedürftigen tapfer mithielten, spurlos hinweggespült.
Drohte vollends einem eine Gefahr von einem Gegner außerhalb
dieses Kreises, so traten sofort alle für den einen männiglich ein, und
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com