0% found this document useful (0 votes)

49 views

Efficient Read Alignment Using Burrows Wheeler Transform and Wavelet Tree

burrow wheeler transform

Uploaded by

Hasif Aiman

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views

Efficient Read Alignment Using Burrows Wheeler Transform and Wavelet Tree

burrow wheeler transform

Uploaded by

Hasif Aiman

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2015 Second International Conference on Advances in Computing and Communication Engineering

Efficient Read Alignment Using Burrows Wheeler Transform and Wavelet Tree

Sanjeev Kumar Suneeta Agarwal Rajesh Prasad

Department of CSE Department of CSE Department of Computer Science
MNNIT, Allahabad MNNIT, Allahabad Yobe State University, Damaturu
U.P., India U.P., India Yobe State, Nigeria
[email protected] [email protected] [email protected]

Abstract—In genome sequence alignment problem, a several bioinformatics tools for read mapping, e.g. SOAP [9],
reference string and number of query strings referred as short BWA [5], BWA-SW [13] and Bowtie [11].
reads, are given, goal is to seek out occurrences of these query
strings in the reference string. Huge amount of reads generated Four categories of alignment programs are currently used to
by new sequencing technologies (Illumina/Solexa) need the map the short reads sequences. First category is based on
development of an efficient algorithm requiring both less hashing of read sequence such as RMAP (Smith et. al., 2008),
memory and computational time. There are number of indexing MAQ (Li et al 2008(1)) and ZOOM (Li et. al. 2008(2)). These
and string matching techniques to align short reads on reference programs have flexible memory space but do not support
string(genome). Size of index of the reference string in each of gapped alignment and multithreading. In second category of
existing techniques is large. In this paper, a new self compressed alignment, programs are based on hashing of reference genome
index technique (BWT-WT) is proposed. BWT-WT scheme is such as SOAP (Li et. al. 2008(3)), and BFAST [14]. Programs
based on Burrow Wheeler Transform (BWT) and Wavelet tree of this category supports multithreading for alignment of reads
(WT). BWT-WT also supports exact alignment of DNA sequence but size of reference genome index is very large. Third
reads. Performances of BWT-WT with other BWT based tools of category programs are based on merge sorting of reference
short read alignments are compared. Experiments show that genome as well as merge sorting of read sequence such as
BWT-WT based program achieves more compression and also Malhis (Malhis et. al. 2009) but these programs are not very
faster searching in comparison to other existing tools.
much popular as they do not support pair end mapping. Fourth
Keywords—Burrows-Wheeler Transform, FM Index, Full Text
category of program is based on Burrows Wheeler Transform
Index, Wavelet Tree and Sequence Analysis. (BWT, 1994) which is efficient in both memory footprint as
well as speed. Some of the software programs of this category
I. INTRODUCTION are: BWA [5], Bowtie [11] and SOAP [9].
Next generation sequencing machine Illumina/Solexa Programs of fourth category mentioned above are having
generates millions of short reads DNA sequences in a single relatively small memory footprint, efficient in searching and
run of the machine. These reads must be mapped to one or support exact matching as well as inexact matching with some
more reference genomes. The orientation of a read relative to bounded allowed differences. Exact matching by these
genome in not known. To match these reads, the main problem programs take few seconds to align the reads but to align the
is how to align the reads to reference genome accounting for inexact reads it takes too much time to find all the similar
exact matching with a reasonable amount of time and memory substrings. In case of DNA profiling multiple reference
space? There are number of applications where short read genomes are used for analysis and identification of gene
alignments are used. Example includes: assembling reads into a behaviour, the size of index again become an issue, so
genome, aligning reads to one reference genome for analysis of reduction of index size is required. As a result, development of
genomic variation, aligning a micro-biome to a set of reference efficient program requiring lesser memory and computation
genomes for species or functional analysis etc. time is need of today.
Searching biological sequences in genome and protein is BWT [2] based algorithm uses number of external tools
important to understand genetic blue print of living organism. such as move to front encoding (which is used to rearrange the
This resulted in a fast development of new technologies characters in similar order), run length coding and variable
generating vast amounts of sequence data to be analyzed [7]. length coding to compress the reference sequence.
For this reason, today the focus changed from data acquisition The Wavelet Tree [4] was invented in 2003 by Grossi,
to efficient data storage and processing methods. To regain the Gupta and Vitter, as a data structure to represent a sequence
original ordering of the reads, often they are aligned to a and answer some queries on it. It is a milestone in compressed
reference genome, where the massive number of sequences that full text indexing which adapts the compressibility of the data
need to be processed requires smooth search scheme and data in many ways excellently. Two key approaches to achieve this
structures. are using specific coding (Entropy Coding) on bitmaps and
A lot of effort has been made to develop methods that are modifying the tree shape.
both memory efficient and fast. One approach to derive In this paper, a new self compressed indexing supporting
suitable data structures is the Burrows-Wheeler Transform exact alignment of DNA sequence reads is proposed, which is
(BWT), which can be understood as a rearrangement of based on BWT & Wavelet Tree. The advantage of BWT-WT
characters in a sequence. Therefore, it has been integrated in is, it provide index of optimal size and supports number of

978-1-4799-1734-1/15 $31.00 © 2015 IEEE 133

DOI 10.1109/ICACCE.2015.80

Authorized licensed use limited to: UNIVERSITI TEKNOLOGI MARA. Downloaded on April 05,2022 at 07:43:16 UTC from IEEE Xplore. Restrictions apply.
queries such as access, rank & select in constant time. 3. Construct the transformed text Tbwt by taking the last
Performance of BWT-WT for DNA sequence alignment is column of BWM.
compared with other BWT based tools. Experiments show that
BWT-WT based program achieves more compression and The transformed text Tbwt in the last column is also denoted
efficient searching in comparison to other techniques. as L (last). In particular, the first Column of BWM F (first), is
obtained by lexicographically sorting the characters of T. Fig. 1
This paper is organized as follows. Sec. II describes the shows the construction of BWT.
related concepts. Section III presents proposed compression
and indexing techniques based on Burrows-Wheeler Transform Index Cyclic Shifting Index BWT Matrix
and Wavelet tree. Sec. IV presents the experimental setup and 0 AGCAGT$ 0 $AGCAGT
analysis of the results. Finally, Sec V concludes the paper. 1 GCAGT$A After 1 AGCAGT$
2 CAGT$AG
Sorting 2 AGT$AGC
II. RELATED CONCEPTS
3 AGT$AGC 3 CAGT$AG
A. Suffix Tree and Suffix Array 4 GCAGT$A
4 GT$AGCA
Suffix tree [1] has been used as an important data structure in 5 GT$AGCA
5 T$AGCAG
string processing. This data structure plays a prominent role in
6 $AGCAGT 6 T$AGCAG
algorithms but is not as prevalent in actual implementations of
software tools. There are two major reasons for this. The first Fig. 1. Construction of Burrows Wheeler Transform Matrix for Text
reason is the space consumption, as the suffix tree requires T=AGCAGT$. TBWT=T$CGAAG
quite large space, though its performance is asymptotically
linear. The second reason is that the suffix tree demonstrates a C. FM-Index
poor locality of memory reference. It causes a significant loss In 2000, six years after the BWT was appeared, Paolo
of efficiency in architectures of cached processor. Ferragina and Giovanni Manzini[3] published a paper
Suffix array [6] is introduced by Manber & Myers [6] as a describing how the BWT, together with some small auxiliary
simple, space efficient indexing method alternative to suffix data structures, can be used as a space-efficient index of
trees. It is key data structure for solving a number of problems reference string T?. They named it as FM Index. Just as the
on data compression and information retrieval for biological Last to First Mapping [3, 8] was the key to understanding how
sequence analysis and pattern discovery. It is defined as the the BWT is reversible, it is also the key to how it can be used
permutation of index numbers giving the starting positions of as an index?
suffixes of a given string in alphabetical order. Table I shows D. Wavelet Tree
the suffix array for the string “AGCAGT$”.
A wavelet tree [4] is a binary tree of bit strings to represent a
TABLE I. SUFFIX ARRAY FOR TEXT T=AGCAGT$ given text T. For an alphabet Σ and a text of length n, the tree
needs O(log2n) bits of storage and supports the determination
Suffixes Ordered Suffixes
I S[i] I S[i] Ssuf
of character at a specified position in O(log|Σ|) time. In
0 AGCAGT$ 0 6 $ addition, it allows to obtain the number of occurrences of a
1 GCAGT$ 1 0 AGCAGT$ given character up to a specified position in O(log |Σ|) time.
2 CAGT$ 2 3 AGT$ Fig. 2 shows the wavelet tree for AGCAGT$.
3 AGT$ 3 2 CAGT$
4 GT$ 4 1 GCAGT$ E. Existing Technique
5 T$ 5 4 GT$
There are number of techniques for short read alignment to
6 $ 6 5 T$
reference genome such as MAQ, BWA, Bowtie and SOAP .In
Burrows Wheeler Aligner (BWA) [5], short read alignments
B. Burrrows Wheeler Transform are performed. BWA is based on Burrows Wheeler Transform
DNA sequencing algorithms based on Burrows Wheeler BWT and FM Indexes [3]. In BWA alignment, an index based
Transform (BWT) [2] are widely used in genome sequencing on BWT and Suffix Array is created. To search efficiently
analysis. The main concept of BWT is to sort all rotations of a BWA use FM Index [3, 7] which is based on backward search
given string in lexical order in form of BWM (Burrows method. FM Index uses number of other auxiliary data
Wheeler Matrix) and then return the last column as a result. structures such as count & occurrence table for performing the
This last column, i.e., the BWT string, can be easily search operation. Count table is use to store the number of
compressed, because it has many repeated characters together. characters involved in the string and number of character
BWT also allows fast string matching on compressed text. It is smaller than any character c, occurrence table is use to store
implemented by the following steps: the rank of character. The size of suffix array and occurrence
1. Derive a conceptual matrix M whose rows are n cyclic table is too large, so here only sample values are used to store
shifts of the text T, n being the length of text. and other values are calculating on demand. In order to
perform exact matching, count and locate function are used.
2. Lexicographically sort the text of resultant matrix Count function return no of occurrence of pattern P into Text
called BWM. T, whereas Locate function return the location of pattern P
into text T.

134

Authorized licensed use limited to: UNIVERSITI TEKNOLOGI MARA. Downloaded on April 05,2022 at 07:43:16 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Wavelet Tree for text T= AGCAGT$

III. PROPOSED ALIGNMENT TECHNIQUE USING BWT-WT

BWA uses the Burrows wheeler based indexing, which takes
space as the size of reference string (in case of human genome
it is 3 billion long). The main disadvantage of above
techniques is that BWT itself does not offer compression but
only arranges the text in compressible form. For compression,
it uses some other external techniques such as Move to Front Fig. 3. Block Diagram of Wavelet Tree based Indexing and Searching
Technique
encoding (MTF), Run Length Encoding &Variable Length
prefix coding .It requires lots of computation overhead and A. Index based on BWT-WT:
use of CPU peak memory. In order to overcome above A wavelet tree encodes a given text T as a binary tree. The
problems, here we introduce a new indexing technique BWT- tree is constructed by defining subtext for each node which is
WT based on BWT and Wavelet Tree (WT). Wavelet tree then encoded by bit strings, generated by comparing elements
compresses the string itself and can also be used as a of the subtext to a pivot element p. Each character c smaller
component of other compression tool. It also uses binary than p is represented by a ‘0’, while characters greater or equal
succinct data structures RRR [14] (New library that represent than p encoded by a ‘1’.
each character in optimal space and gives very fast
Rank/Select operation) to compress the WT nodes, and answer Now a bit string defines the strings of the child nodes, where
rank/select queries in constant time. Another advantage of WT all characters represented by ‘0 ’ forms the new substring of
is that it can be extended by changing its shape (Huffman the left child and all characters encoded by ‘1’ define the
shape and its variants) and using some compression booster substring of the right child node. A wavelet tree for the BWT
algorithm to meet a high level compression. The block of “AGCAGCAGACT$” and its index is shown Fig. 4 and
diagram of proposed technique is shown in Fig. 3: Table II respectively.
B. String Searching in the BWT-WT index
Before searching the pattern P into reference string, the
following operations are required. Given the wavelet tree for
the text T, algorithms given in Fig. 5 are used to perform
searching of a string P in to the text T.

135

Authorized licensed use limited to: UNIVERSITI TEKNOLOGI MARA. Downloaded on April 05,2022 at 07:43:16 UTC from IEEE Xplore. Restrictions apply.
(3) if c is in the left sub-tree of v then
(4) r ← rank0(Bv(r))
(5) v←leftchild(v)
(6) else
(7) r←rank1(Bv(r))
(8) v←rightchild(v)
(9) return i

WTBWT-select(c, i)
(1) v ← leaf representing c: r ← i
(2) while v is not root do
(3) p←parent (v)
(4) if v is in the left child of p then
(5) r←select0(Bp(r)) //selectc(T, i) - the position of the
ith occurrence of c in text T.
(6) else
(7) r ← select1(Bp(r))
Fig. 4. Wavelet Tree Index Based on BWT (8) v ← p
(9) return r
TABLE II. INDEX FOR STRING T = AGCAGCAGACT$

i Suffix# BWT(T) Sorted Suffixes WTsearchT(c, (st, ed))

1 12 T $ 1. Let c[c] be the total number of characters in T that is
alphabetically less than c
2 9 G ACT$
2. st=c[P[i]]+rank(s-1,P[i])+1
3 7 C AGACT$ 3. et=c[P[i]]+rank(e , P[i])
4 4 C AGCAGACT$ 4. return (st, et)
5 1 $ AGCAAGCAGACT$ Fig. 5. String searching functions used in the BWT-WT index
6 6 G CAGACT$
7 3 G CAGCAGACT$
By the use of above functions (Fig.5), suffix interval of pattern
P is derived .For any given pattern P specified by its suffix
8 10 A CT$ range (st,ed) in Suffix Array (SA), operation WTsearchT(c,
9 8 A GACT$ (st, ed)) returns the suffix range in SA of the string P = cP,
10 5 A GCAGACT$
where c is any character st is the start range of Text (initially
s=1) and ed is the end range of Text T (size of text). Find the
11 2 A GCAGCAGACT$A suffix interval of a pattern P into reference string T
12 11 C T$ recursively. Example 1 explains the algorithm.
Example 1:
For the text T = AGCAGCAGACT$ and TBWT =
Following functions are used for backward searching:
TGCC$GGAAAAC. Let pattern P to be searched in text T is
“GCA” Here i=3 (size of pattern), s=1(initially) e=12(size of
BWT-WT access T(i) the text T)
(1) v ← root; r ← i Step 1: for i=3
(2) while v is not a leaf do c=P[i]=P[3]=A
(3) if access Bv(r) = 0 then // access(i)- access the character st=c[A]+rank(0, A)+1=2
at i in text T et=c[A]+rank(12, A)=5
(4) r ← rank0(Bv(r)) // rankc(T, i) - the number of i.e character A occurs in Suffix interval 2 and 5 in Table II
character c at or before position i in text T Step 2: for i=2
(5) v ← leftchild(v) c=P[i]=P[2]=C
(6) else st=c[C]+rank(1,C)+1=5+0+1=6
(7) r ← rank1(Bv(r)) et=c[C]+rank(5,C)=5+2=7
(8) v ← rightchild(v) i.e character CA occurs in Suffix interval 6 and 7 in Table II
(9) return label of v Step 3: for i=1
c=P[i]=P[1]=G
WTBWT-rank(c, i) st=c[G]+rank(5,G)+1=8+1+1=10
(1) v ← root; r ← i et=c[G]+rank(7,G)=8+3=11 [st, et]=[10, 11]
(2) while v is not a leaf do i.e pattern GCA occur in Suffix interval 10 and 11 in Table II

136

Authorized licensed use limited to: UNIVERSITI TEKNOLOGI MARA. Downloaded on April 05,2022 at 07:43:16 UTC from IEEE Xplore. Restrictions apply.
So indices corresponding to pattern P of suffix interval are 3. English texts from the Wikipedia dump .
[2,5] from table II, Hence pattern GCA occurs in string 4. Simulated Data of DNA sequence (Arabidopsis thaliana)
T=AGCAGCAGACT$ at two times and their starting position and their short read archives (http://plants.ensembl.org/) is
are 2nd and 5th in text T. used to compare CPU time depicted in TABLE V.
.
IV. EXPERIMENTAL SETUP & RESULTS ANALYSIS G++4.7.3 is used to build all the source code for experiments
The experiments were conducted on a HP Pavilion g series through the Succinct Data Structure Library (SDSL).
with a 2.8 GHz four-core Intel@CoreTM i3-860 chip with 4
MB L3 Cache, but no parallelism was used. The machine runs TABLE III shows the space required for index prepared to be
64-bit Ubuntu 12.04 operating system and has 4 GB internal used in BWT-WT. Comparison of index size of proposed
memory and one 500 GB Serial ATA Hard Drive (7,200 approach BWT-WT with other tools BWA [5], Soap [9] and
RPM). Following real-world biological and non-biological Bowtie [11] in TABLE IV and Figure 5.Comparision of CPU
data to test the efficiency and usability of proposed method: time of proposed scheme with others is shown in TABLE 5.
1. The human genome sequences from NCBI.
2. Protein data from the Pizza & Chili Corpus .

TABLE III. PROPOSED INDEX (BWT-WT) SPACE ANALYSIS

Sequence Input Size Index size Count Number of Size of Construction Time
(N bytes) (bytes) Array size Suffixes Auxiliary (Sec.)
(bytes) (bytes) data (bytes)
(bytes)
English 32619430 36326025~1.11N 1028 2038716 1876136 39.94
DNA 52428801 33881529~0.65N 1028 3276804 3063596 52.567
Protein 52428801 43720341~0.83 N 1028 3276804 3063444 54.801

TABLE IV. INDEX SIZE COMPARISON OF PROPOSED BWT-WT INDEX WITH OTHERS

File File Size Bowtie BWA SOAP BWT-WT

Genome 50 MB 61.5 MB 62.1 MB 114.5 MB 32.3 MB

E-coli 15.3 MB 16 MB 16 MB 30.7 MB 9.9 MB

TABLE V. CPU TIME COMPARISON OF DIFFERENT ALIGNMENT TECHNIQUES

FOR SIMULATED DATA

Program Read Length single pair end read (bp) CPU Time (s)

Bowtie 36 375
Soap 36 249
BWA 36 289
BWT-WT 36 284

137

Authorized licensed use limited to: UNIVERSITI TEKNOLOGI MARA. Downloaded on April 05,2022 at 07:43:16 UTC from IEEE Xplore. Restrictions apply.
Index
Size
(MB)

Figure 5 Comparison of index size of proposed scheme with others

[5] H. Li and R. Durbin. Fast and accurate short read alignment with
V. CONCLUSION burrows–wheeler transform. Bioinformatics, 25(14): pp. 1754–1760,
2009.
In this paper it is shown that how to extend the BWT based [6] U. Manber and G. Myers. Suffix arrays: a new method for on-line string
approach to WT based data structure for compressed indexes. searches. In Proceedings of the first annual ACM-SIAM symposium on
BWT-WT is a simple and faster scheme for short read Discrete algorithms, SODA ’90, pp. 319–327, Philadelphia, PA, USA,
alignment. Experiments show that BWT-WT based program 1990. Society for Industrial and Applied Mathematics.
achieves more compression and also efficient searching speed [7] D.Zhang, Q.Liu Compression and Indexing based on BWT: A
Survey.Web Information System and Application Confrence, 2013.
in comparisons to BWT based approach. As a future work,
[8] Schindler, M. (1997, March). A fast block-sorting algorithm for lossless
one can consider approximate matches (insert, delete, gaps). data compression. In Proceedings of the Conference on Data
Compression (Vol. 469). IEEE Computer Society.
REFERENCES [9] Li, R., Li, Y., Kristiansen, K., & Wang, J. (2008). SOAP: short
oligonucleotide alignment program. Bioinformatics, 24(5), pp. 713-714.
[1] D. Adjeroh, T. Bell, and A. Mukherjee. The Burrows-Wheeler [10] B.Langmead, C.Trapnell, M.Pop, S.Salzberg. 2009. Ultrafast and
Transform: Data Compression, Suffix Arrays, and Pattern Matching. memory-efficient alignment of short DNA sequences to the human
Springer, 1 edition, 2008. genome. Genome Biology 2009,Vol.10,Issue 3,Article R25.
[2] M. Burrows and D. J. Wheeler. A block-sorting lossless data [11] Succinct Data Structure Library: https://github.com/simongog/sdsl-lite
compression algorithm. Systems Research, Research R(124): pp.1–24, [12] H. Li and R. Durbin. Fast and accurate long read alignment with
1994. burrows–wheeler transform. Bioinformatics, 26(5): pp.589-95, 2010.
[3] P. Ferragina, G. Manzini, V. M¨akinen, and G. Navarro. Compressed [13] R. Raman, V. Raman, and S. Srinivasa Rao. Succinct indexable
representations of sequences and full-text indexes. ACM Trans. dictionaries with applications to encoding k-ary trees and multisets. In
Algorithms, 3, 2007. SODA, pp. 233–242, 2002.
[4] R. Grossi, A. Gupta, and J. S. Vitter. High-order entropy-compressed
text indexes. In Proceedings of the fourteenth annual ACM-SIAM
symposium on Discrete algorithms, SODA ’03, pp. 841–850,
Philadelphia, PA, USA, 2003. Society for Industrial and Applied
Mathematics.

138

Authorized licensed use limited to: UNIVERSITI TEKNOLOGI MARA. Downloaded on April 05,2022 at 07:43:16 UTC from IEEE Xplore. Restrictions apply.

Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
Fault Tolerant & Fault Testable Hardware Design
From Everand
Fault Tolerant & Fault Testable Hardware Design
Parag K. Lala
5/5 (2)
Algorithms On Strings Trees and Sequence PDF
No ratings yet
Algorithms On Strings Trees and Sequence PDF
326 pages
Algorithms On String Trees and Sequences
No ratings yet
Algorithms On String Trees and Sequences
326 pages
14 Mini Love Stories in 100 Words or Less - Reader's Digest PDF
100% (1)
14 Mini Love Stories in 100 Words or Less - Reader's Digest PDF
8 pages
Bioinformatics: Fast and Accurate Long Read Alignment With Burrows-Wheeler Transform
No ratings yet
Bioinformatics: Fast and Accurate Long Read Alignment With Burrows-Wheeler Transform
7 pages
3 RNAseq-Mapping LO
No ratings yet
3 RNAseq-Mapping LO
98 pages
Rapid Parallel Genome Indexing With Mapreduce: Rohith K. Menon Goutham P. Bhat Michael C. Schatz
No ratings yet
Rapid Parallel Genome Indexing With Mapreduce: Rohith K. Menon Goutham P. Bhat Michael C. Schatz
8 pages
Brief Bioinform-2010-Li-473-83
No ratings yet
Brief Bioinform-2010-Li-473-83
11 pages
Lecture_28_Unit6_1
No ratings yet
Lecture_28_Unit6_1
16 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
A Survey of Mapping Algorithms in The Long-Reads Era
No ratings yet
A Survey of Mapping Algorithms in The Long-Reads Era
23 pages
Lecture 03
No ratings yet
Lecture 03
36 pages
Burrows-Wheeler Aligner
No ratings yet
Burrows-Wheeler Aligner
5 pages
A Survey of Whole Genome Alignment Tools and Frameworks Based On Hadoop'S Mapreduce
No ratings yet
A Survey of Whole Genome Alignment Tools and Frameworks Based On Hadoop'S Mapreduce
6 pages
Data Structures I Essentials
From Everand
Data Structures I Essentials
Dennis Smolarski
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Homer: Mapping Reads To The Genome
No ratings yet
Homer: Mapping Reads To The Genome
5 pages
RM Review
No ratings yet
RM Review
5 pages
Multiple Sequence Alignment Tools: Tutorials and Comparative Analysis
No ratings yet
Multiple Sequence Alignment Tools: Tutorials and Comparative Analysis
19 pages
Heuristic Local Alignerers: The Basic Indexing & Extension Technique
No ratings yet
Heuristic Local Alignerers: The Basic Indexing & Extension Technique
39 pages
Présentation Ekin en
No ratings yet
Présentation Ekin en
40 pages
Multiple Seq Alignment
No ratings yet
Multiple Seq Alignment
36 pages
Sequence Alignment
No ratings yet
Sequence Alignment
29 pages
DNA Barcode: Student: Nguyen Huu Cuong Lecturer: PH.D Le Sy Vinh
No ratings yet
DNA Barcode: Student: Nguyen Huu Cuong Lecturer: PH.D Le Sy Vinh
13 pages
CMB Lab Exp 9
No ratings yet
CMB Lab Exp 9
9 pages
Genoogle: An Indexed and Parallelized Search Engine For Similar DNA Sequences
No ratings yet
Genoogle: An Indexed and Parallelized Search Engine For Similar DNA Sequences
18 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Java Streams Explained: A Practical Guide with Examples
From Everand
Java Streams Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Lab 4: Phylogenetics: Bioinformatic Methods I Lab 4
No ratings yet
Lab 4: Phylogenetics: Bioinformatic Methods I Lab 4
20 pages
Blast
100% (1)
Blast
21 pages
Software: Next-Generation Sequence Alignment Software
No ratings yet
Software: Next-Generation Sequence Alignment Software
3 pages
Algorithms On Strings Trees and Sequences
100% (1)
Algorithms On Strings Trees and Sequences
163 pages
Basics of Bioinformatics
100% (7)
Basics of Bioinformatics
99 pages
Module_4_Reference Course content
No ratings yet
Module_4_Reference Course content
25 pages
Bioinformatic Paper WPS Office
No ratings yet
Bioinformatic Paper WPS Office
20 pages
Bioinformatics Past Paper-WPS Office
No ratings yet
Bioinformatics Past Paper-WPS Office
19 pages
Algorithm Design and Scoring Matrices PDF
No ratings yet
Algorithm Design and Scoring Matrices PDF
31 pages
New Sequence Alignment Algorithm Using Ai Rules and Dynamic Seeds
No ratings yet
New Sequence Alignment Algorithm Using Ai Rules and Dynamic Seeds
14 pages
Unit2 2
No ratings yet
Unit2 2
30 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
Spectrum Preserving Tilings Enable Sparse and Modular Reference Indexing
No ratings yet
Spectrum Preserving Tilings Enable Sparse and Modular Reference Indexing
20 pages
Bioinformatics
No ratings yet
Bioinformatics
22 pages
Diploma - Practical
No ratings yet
Diploma - Practical
11 pages
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Clustal W Multiple Sequence Alignment
No ratings yet
Clustal W Multiple Sequence Alignment
18 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Bioinformatics Definition
No ratings yet
Bioinformatics Definition
11 pages
PAM Blosum: Assignment 1 Bioinformatics (DSE 1)
100% (3)
PAM Blosum: Assignment 1 Bioinformatics (DSE 1)
9 pages
Finding The Meaning in Genes: New On The Market
No ratings yet
Finding The Meaning in Genes: New On The Market
4 pages
BIO OGY SHORT - Merged
No ratings yet
BIO OGY SHORT - Merged
2 pages
Data Structure and Algorithms in Java: From Basics to Expert Proficiency
From Everand
Data Structure and Algorithms in Java: From Basics to Expert Proficiency
William Smith
No ratings yet
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
Bioinformatics:: Guide To Bio-Computing and The Internet
No ratings yet
Bioinformatics:: Guide To Bio-Computing and The Internet
34 pages
Bioinformatics and Biostatistics Course
No ratings yet
Bioinformatics and Biostatistics Course
6 pages
Sequence search algorithms
No ratings yet
Sequence search algorithms
3 pages
Background:: A Modeling and Simulation Web Tool For Plant Biologists
No ratings yet
Background:: A Modeling and Simulation Web Tool For Plant Biologists
3 pages
4 - 7 Genome Assembly To Annotation - Final
No ratings yet
4 - 7 Genome Assembly To Annotation - Final
92 pages
A.I. Cancer Timebomb
From Everand
A.I. Cancer Timebomb
charles r giardina
No ratings yet
Li - 2018 - Minimap2 Pairwise Alignment For Nucleotide Sequen
No ratings yet
Li - 2018 - Minimap2 Pairwise Alignment For Nucleotide Sequen
7 pages
Unveiling the Secrets of ChatGPT Inside the Mind of an AI
From Everand
Unveiling the Secrets of ChatGPT Inside the Mind of an AI
Nelson Ambrose
No ratings yet
Architecture Portfolio - Leslie Epps
100% (1)
Architecture Portfolio - Leslie Epps
27 pages
Ch 8 (1)
No ratings yet
Ch 8 (1)
18 pages
Middle-Earth: Against The Shadow : Came (MECCG) Products
No ratings yet
Middle-Earth: Against The Shadow : Came (MECCG) Products
3 pages
Case Digest-Kho (Canon 1)
No ratings yet
Case Digest-Kho (Canon 1)
5 pages
Karim Soltan Is Shopping For A New Vehicle and Has
No ratings yet
Karim Soltan Is Shopping For A New Vehicle and Has
2 pages
Activity Chapter 5: Effect On December 31, 20X1: Using Straight Line Method
No ratings yet
Activity Chapter 5: Effect On December 31, 20X1: Using Straight Line Method
2 pages
Gelatin in Photography
No ratings yet
Gelatin in Photography
276 pages
West Bengal Board Class 12 Biological Sciences Syllabus
No ratings yet
West Bengal Board Class 12 Biological Sciences Syllabus
9 pages
Lesson 4 - Festivals and Culture
No ratings yet
Lesson 4 - Festivals and Culture
30 pages
Motivation and Learning Strategies of Education Students in Online Learning During Pandemic
No ratings yet
Motivation and Learning Strategies of Education Students in Online Learning During Pandemic
7 pages
WB TP 501
No ratings yet
WB TP 501
296 pages
Soal PSAT Bahasa Inggris Kelas VII 2023 Kurmed
100% (2)
Soal PSAT Bahasa Inggris Kelas VII 2023 Kurmed
9 pages
Trading Strategies 16 Candlestick Patterns Every Trader Should Kno
No ratings yet
Trading Strategies 16 Candlestick Patterns Every Trader Should Kno
15 pages
Baloc, Sto. Domingo, Nueva Ecija S.Y. 2019-2020
No ratings yet
Baloc, Sto. Domingo, Nueva Ecija S.Y. 2019-2020
24 pages
Fuel-Dispensing Stations Checklist Brochure
No ratings yet
Fuel-Dispensing Stations Checklist Brochure
1 page
OPD Policy Document
No ratings yet
OPD Policy Document
13 pages
StarTec External HD
No ratings yet
StarTec External HD
4 pages
Base Sas Certification Exercise
No ratings yet
Base Sas Certification Exercise
47 pages
Flood Control by Reservoir Operation For
No ratings yet
Flood Control by Reservoir Operation For
6 pages
Pid Toolbox
No ratings yet
Pid Toolbox
6 pages
Samsung NP-N100S - BA41-01868A 01869A 01871A - Good
No ratings yet
Samsung NP-N100S - BA41-01868A 01869A 01871A - Good
2 pages
RA 18 (Well Bore Using Excavator Mounted Earth Auger Drill) 2
No ratings yet
RA 18 (Well Bore Using Excavator Mounted Earth Auger Drill) 2
8 pages
End of Term Mentor Evaluation Form 2023
No ratings yet
End of Term Mentor Evaluation Form 2023
2 pages
Shrek
No ratings yet
Shrek
2 pages
English 7
No ratings yet
English 7
13 pages
Macroeconomics Canadian 4th Edition Williamson Solutions Manual - Full Book Is Now Available For Download
100% (2)
Macroeconomics Canadian 4th Edition Williamson Solutions Manual - Full Book Is Now Available For Download
41 pages
ADC Unit 5
No ratings yet
ADC Unit 5
36 pages
TT223KVM User Manual
No ratings yet
TT223KVM User Manual
2 pages
Mathematics10 Quarter2 Week3
No ratings yet
Mathematics10 Quarter2 Week3
6 pages

Efficient Read Alignment Using Burrows Wheeler Transform and Wavelet Tree

Uploaded by

Efficient Read Alignment Using Burrows Wheeler Transform and Wavelet Tree

Uploaded by

2015 Second International Conference on Advances in Computing and Communication Engineering

Sanjeev Kumar Suneeta Agarwal Rajesh Prasad

978-1-4799-1734-1/15 $31.00 © 2015 IEEE 133

III. PROPOSED ALIGNMENT TECHNIQUE USING BWT-WT

i Suffix# BWT(T) Sorted Suffixes WTsearchT(c, (st, ed))

TABLE III. PROPOSED INDEX (BWT-WT) SPACE ANALYSIS

File File Size Bowtie BWA SOAP BWT-WT

Genome 50 MB 61.5 MB 62.1 MB 114.5 MB 32.3 MB

E-coli 15.3 MB 16 MB 16 MB 30.7 MB 9.9 MB

TABLE V. CPU TIME COMPARISON OF DIFFERENT ALIGNMENT TECHNIQUES

Figure 5 Comparison of index size of proposed scheme with others

You might also like