Metagenomics - Remove Host Sequences
Metagenomics - Remove Host Sequences
Remove host
sequences
Metagenomics
Tools
16S tools
Assembly
BLAST
Pathogen screening
SAMtools
Sequence data
quick solution to get the paired reads that do not map to the host reference genome (both reads unmapped).
Shotgun sequencing
# download ready to use bowtie2 database of human host genome GRCh38 (hg38)
Alignment wget https://genome-idx.s3.amazonaws.com/bt/GRCh38_noalt_as.zip
unzip GRCh38_noalt_as.zip
Data
NCBI SRA file # run bowtie2 mapping (using --un-conc-gz to get gzip compressed output files; 8 processors)
format bowtie2 -p 8 -x GRCh38_noalt_as -1 SAMPLE_R1.fastq.gz -2 SAMPLE_R2.fastq.gz --un-conc-gz
SAMPLE_host_removed > SAMPLE_mapped_and_unmapped.sam
Quality control
°°°
Option --un-conc shows results like samtools options -F 2 (excluding reads "mapped in proper pair").
Paired reads that do not map both to the host sequence might still be included in the "host removed" output.
For better control about read filtering options, see workflow below.
If multi-processor option -p is used, output reads might have a different order compared to input files.
(read order refers to .sam output but might effect also host-removed read output files .1 .2)
http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#performance-options
a) bowtie2 mapping against host genome: write all (mapped and unmapped) reads to a single .bam file
c) samtools fastq: split paired-end reads into separated R1 and R2 fastq files
wget https://genome-idx.s3.amazonaws.com/bt/GRCh38_noalt_as.zip
unzip GRCh38_noalt_as.zip
# move all files into your working directory (or into your predefined $BOWTIE2_INDEXES location)
→ see all available bowtie2 databases (host species list is shown on the right)
2) bowtie2 mapping against host sequence database, keep both aligned and unaligned reads (paired-end reads)
bowtie2 -p 8 -x host_DB -1 SAMPLE_R1.fastq.gz -2 SAMPLE_R2.fastq.gz -S
SAMPLE_mapped_and_unmapped.sam
http://www.htslib.org/doc/samtools-fasta.html
Result
Two files of paired-end reads, containing non-host sequences
SAMPLE_host_removed_R1.fastq.gz
SAMPLE_host_removed_R2.fastq.gz
See also: