RNA-seq第四期——HTSeq-count对reads进行计数

        对测序得到的reads进行计数,即基因表达的定量过程。根据reads和基因位置的overlap,以此来判断reads到底属于哪一个基因,同时对该reads总数进行计数,生成counts矩阵。


今日内容

1.HTSeq-count对reads进行计数

2.R语言完成counts矩阵的合并


1. HTSeq-count对reads进行计数

首先了解HTseq用法,参数说明如下:

usage: htseq-count [options] alignment_file gff_file
positional arguments:
  samfilenames          Path to the SAM/BAM files containing the mapped reads.
                        If '-' is selected, read from standard input
  featuresfilename      Path to the file containing the features

optional arguments:
  -h, --help            show this help message and exit
  -f {sam,bam}, --format {sam,bam}
                        type of <alignment_file> data, either 'sam' or 'bam'
                        (default: sam) 
                        输入文件类型sam/bam;默认为sam文件                    
  -r {pos,name}, --order {pos,name}
                        'pos' or 'name'. Sorting order of <alignment_file>
                        (default: name). Paired-end sequencing data must be
                        sorted either by position or by read name, and the
                        sorting order must be specified. Ignored for single-
                        end data.
                        输入文件的排序方式,默认按read名排序
  --max-reads-in-buffer MAX_BUFFER_SIZE
                        When <alignment_file> is paired end sorted by
                        position, allow only so many reads to stay in memory
                        until the mates are found (raising this number will
                        use more memory). Has no effect for single end or
                        paired end sorted by name
  -s {yes,no,reverse}, --stranded {yes,no,reverse}
                        whether the data is from a strand-specific assay.
                        Specify 'yes', 'no', or 'reverse' (default: yes).
                        'reverse' means 'yes' with reversed strand
                        interpretation
  -a MINAQUAL, --minaqual MINAQUAL
                        skip all reads with align