0% found this document useful (0 votes)

65 views5 pages

Bio Tools Booklet

This document provides summaries of bioinformatics tools including: - Prodigal for gene finding in bacteria and archaea. It is available via Bioconda. - BLAST for comparing sequences via online or local BLAST+ installation with makeblastdb and blastn/blastp. - tRNAscan and Barrnap for identifying tRNAs and rRNAs, available via Bioconda. - Mafft, Muscle and Kalign for multiple sequence alignment hosted on EBI. - Weblogo for sequence logos from alignments and Pfam/UniProtKB for protein family databases. - TMHMM and Philius for transmembrane domain prediction. - RDP Classifier and Hierarchy Browser for ribosomal

Uploaded by

Tanvi Garg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views5 pages

Bio Tools Booklet

Uploaded by

Tanvi Garg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

BB2440 Bioinformatics and Biostatistics - HT 2020

Bioinformatics Tools Booklet

There is a seemingly endless supply of bioinformatics tools. The field of

bioinformatics itself is very broad and, on top of that, there are often many ways of
solving a problem, which will be more or less adequate in different scenarios. This list
includes all the tools that you need to complete the labs (plus a few extra), but is by
no means exhaustive.

Useful commands in the terminal

cat file display an entire file
less file show a file one screen at a time; press q to leave
pwd path-to-directory; shows you where you are
head -n file displays the first n lines of a file
tail -n file displays the last n lines of a file
wc file displays number of lines, words and bytes in a file
ls shows all files in the current directory
ls -lh shows all files in the current directory, including size
and permissions
cp file1 file2 makes a copy of file1 called file2
mv file1 file2 moves file1 to the name or location file2
rm file erases (removes) file
mkdir dir makes a directory called dir
cd dir goes into directory dir
rmdir dir removes directory dir, if it is already empty
grep pattern file prints all lines of the file that contain the pattern
grep -c pattern file counts how many lines in the file contain the pattern

Prodigal
Prodigal is a gene-finding tool for bacteria and archaea. A special mode can be used
for certain bacteria which have non-standard genetic codes. Prodigal outputs the
coordinates of the genes found and their translation into protein. To get nucleotide
sequences Prodigal can also be run through the command line in the following way:

$ prodigal -i input_file -d nucleotide_output_file -a aminoacid_output_file

Prodigal is available on Bioconda. To install it, run

$ conda install -c bioconda prodigal

GenScan: http://hollywood.mit.edu/GENSCAN.html
GenScan is a web-based tool for finding genes and exons in nucleotide sequences. It
is meant for vertebrates and certain plants. If the sequences to be scanned are too
large, it is possible to download GenScan and run it from the command line.
BLAST: Basic Local Alignment Search Tool
Blast is a tool for comparing sequences to each other. This can be used simply to
compare two sequences or to compare a sequence of interest against a very large
database. The standard usage of blast is to compare against a database. The Blast
suite includes many different tools, the main ones are:
Nucleotide blast is used to compare nucleotide sequences against each other.
Megablast is optimized for very similar sequences, while less similar sequences can
be found using blastn.
Protein blast is used to compare amino acid sequences against each other.
The standard way of doing this is through blastp. The other protein blast tools are
designed to find more distantly related proteins (e.g. PSI-blast) by considering the
conservation pattern of amino acids and protein domains.
Other Blast tools are used to compare nucleotide sequences against protein
sequences. This is done by translating the nucleotides into proteins in all three
possible frames for each DNA strand. Blastx is used for comparing a nucleotide
sequence to a protein database, while Tblastn is used for comparing a protein
sequence against a nucleotide database. Tblastx compares nucleotide sequences
against each other, after translating both into protein. This is useful for identifying
proteins with similar functions in distantly related organisms.
Blast can be run online or locally from the command line. In the latter case,
you can build your own database of relevant reference sequences.

Online BLAST: https://blast.ncbi.nlm.nih.gov

The blast server includes several databases. The most popular ones are nr, which
includes every sequence ever submitted to the NCBI servers, and RefSeq, which
includes only well annotated, carefully selected references. In some cases, instead of
giving single proteins as hits, Blast will give whole annotated genomes. In this case,
one must open the genome in question, go to the position of the match (marked in
the Blast output) and read the annotation there.
To compare sequences against each other, one must check the box “Align two
or more sequences” in online Blast.

BLAST+
BLAST+ is a set of command line tools that have the same functionality as online
Blast but uses custom, locally-built databases.
To format a BLAST database, use the command makeblastdb as follows:
$ makeblastdb -dbtype type -in input_fasta_file -out database_name
where type is either prot (for a protein database) or nucl (for a nucleotide database).
Three files will be created with the name database_name plus an extension.
To run a BLAST Nucleotide search, type:
$ blastn -query query_fasta_file -db database_name -evalue evalue_threshold
-outfmt output_format 1 > output_file
The commands blasp, blastx, tblastn, and tblastx can be used analogously. To
see all the options available for a given tool, type the name of the desired tool
followed by the flag -help. Note that the default e-value threshold (10) is very high
and will give many false positives. It is usually a good idea to use a much lower
value, such as 10e-10.

1 Use -outfmt 7.
BLAST+ is available on Bioconda. To install it, run

$ conda install -c bioconda blast

tRNAscan: http://lowelab.ucsc.edu/tRNAscan-SE/
tRNAscan is a tool for identifying transporter RNA in nucleotide sequences. It can be
run online or downloaded to be run locally.
501
tRNAscan is available on Bioconda. To install it, run

$ conda install -c bioconda trnascan-se

Barrnap
Barrnap is a tool for finding ribosomal RNA in nucleotide sequences. It can take
bacterial, archaeal and eukaryotic sequences.

Barrnap is available on Bioconda. To install it, run

$ conda install -c bioconda barrnap

Multiple sequence aligners

Mafft: http://www.ebi.ac.uk/Tools/msa/mafft/
Muscle: http://www.ebi.ac.uk/Tools/msa/muscle/
Kalign: http://www.ebi.ac.uk/Tools/msa/kalign/
These three multiple sequence alignment tools are based in EBI. They run different
algorithms in the background, but the user interface is always the same. The
sequences to be aligned are pasted on a window or uploaded from a file. Protein and
nucleotide sequences are acceptable, in a variety of formats. Several output formats
can also be chosen. The most used ones are Fasta and ClustalW. In the clustalw
option, you can choose to colour amino acids according to their chemical properties,
facilitating the visualization of the alignment.

Weblogo: http://weblogo.berkeley.edu/logo.cgi
Weblogo is a tool for producing logos of conserved sequences based on short
multiple alignments. The fasta or clustalw sequences are pasted or uploaded, and an
image is generated of the chosen format and size.

Pfam: http://pfam.xfam.org/
Pfam is a large database of protein families, many of which have extensive
annotation. You can search through it by providing an accession number (provided
by e.g. online blast), keywords or an amino acid sequence.

UniProtKB: http://www.uniprot.org/
UniProtKB is a high-quality annotated protein database. The annotation is either
done manually (collected in the SwissProt database) or automatically (TrEMBL
database).

TMHMM: http://www.cbs.dtu.dk/services/TMHMM-2.0/
TMHMM is a tool for predicting transmembrane domains by inputting amino acid
sequences in fasta format. The output is a list of partitions of your protein sequence
into regions inside/outside the cell and regions inside the membrane, together with a
plot showing the probability for each amino acid to be placed in each type of region.
Philius: http://www.yeastrc.org/philius
Philius is a tool for predicting transmembrane domains and signal peptides based on
an amino acid sequence (fasta format is supported only by submitting it through an e-
mail form). The output is a confidence measure of the sequence being
transmembrane and a partitioning of your protein sequence into regions
inside/outside the cell and regions inside the membrane, together with a confidence
measure for each region (press the "show list" link next to "Predicted protein
segments" to view these statistics).

RDP Classifier: http://rdp.cme.msu.edu/classifier/classifier.jsp

The Ribosomal Database project is a tool for assigning phylogenety to ribosomal
RNA sequences or subsequences. The 16S (SSU) of bacteria and archaea and the
23S (LSU) of fungi can be used. The RDP can take up to 50 thousand sequences at
a time. In the results page, clicking on the “detailed view” option will give a bootstrap
support to each level of classification. RDP never assigns a species to a sequence,
stopping at the genus level.

RDP Hierarchy browser: http://rdp.cme.msu.edu/hierarchy/hb_intro.jsp

This tool helps you to obtain the ribosomal RNA sequences of many different species
out of the Ribosomal Database project. You can search and select the organisms of
interest and download their rRNA sequences, which could for example be used for a
phylogenetic analysis. For most organisms multiple rRNA sequences are listed, just
pick one of them if you want to make a phylogenetic tree.

Galaxy: https://usegalaxy.org/
Galaxy is an open source, web-based platform for data intensive biomedical
research. The interface is divided into three panels; Tools (left), Display (center) and
History (right). You use the tools panel to upload data and select tools to run. Every
time you upload data or run a tool a new item appears in the History panel. From the
History panel you can choose to view your raw data and/or results from the tools you
have used which will then be displayed in the Display panel. Some files are in binary
format (for example BAM files) and they cannot be viewed. If you choose to view
them they will be downloaded to your computer instead.
When you need to execute the same tool on a number of datasets, there is an
option available to run them all at once in parallel (as shown in the figure below).

Most/all of the tools available in Galaxy are also available as open source
software to be run from the command line. While that may be the ‘standard’ way to
run these tools the Galaxy environment is a great platform to get familiar with the
programs, data files and the results.
Galaxy 101: https://usegalaxy.org/u/aun1/p/galaxy101

Following is a short list of the tools in Galaxy, some of which you will be using
through Galaxy in the labs:
FastQC: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
FastQC is a practical tool that allows you to check various quality aspects of your
sequencing data prior to any downstream analsysis. The input format for FastQC is
sequencing data in the fastq format.

STAR: https://code.google.com/p/rna-star/
STAR is an ultrafast RNA sequencing aligner. It takes in sequencing data in fastq
format and aligns the sequences to a reference genome. The output is a list of
aligned sequences in SAM/BAM format.

Picard Tools: https://broadinstitute.github.io/picard/command-line-overview.html

Picard tools is an extensive toolbox that allows various quality checks, analysis and
manipulation of aligned sequencing data, usually in BAM or SAM format.

SAM tools: http://samtools.sourceforge.net/samtools.shtml

Samtools is an extensive toolbox that allows various quality checks, analysis and
manipulation of aligned sequencing data, usually in BAM or SAM format.

Cufflinks: http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/
Cufflinks is a transcript assembler that is it assembles aligned reads into transcripts,
i.e introns and exons. It also handles the job of calculating FPKM values for
transcripts, both novel and known (annotated) ones. Furthermore, cufflinks includes a
module called cuffdiff that calculates differential expression between two (or more)
groups.

Ip 85HDX 2010
No ratings yet
Ip 85HDX 2010
114 pages
Bioinformatics Pratical File
No ratings yet
Bioinformatics Pratical File
63 pages
Module in Tics
No ratings yet
Module in Tics
20 pages
Blast
100% (1)
Blast
21 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Plant Biotechnology
No ratings yet
Plant Biotechnology
44 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Data Retrieval
67% (3)
Data Retrieval
17 pages
Bioinformatics: ABE 2007 Kent Koster Group 3
No ratings yet
Bioinformatics: ABE 2007 Kent Koster Group 3
43 pages
Supplementary List of Software For Bioinformatics and Comparative Genomics
No ratings yet
Supplementary List of Software For Bioinformatics and Comparative Genomics
5 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
Bioinformatics Lab 2
No ratings yet
Bioinformatics Lab 2
9 pages
Bioinformatics Lab 2 (Evelyn)
No ratings yet
Bioinformatics Lab 2 (Evelyn)
9 pages
Biopython Org DIST Docs Tutorial Tutorial HTML
No ratings yet
Biopython Org DIST Docs Tutorial Tutorial HTML
267 pages
Blast: Background: BLAST Is One of The Most Widely Used Bioinformatics Programs
100% (1)
Blast: Background: BLAST Is One of The Most Widely Used Bioinformatics Programs
4 pages
Basics of Bioinformatics
100% (7)
Basics of Bioinformatics
99 pages
Bioinfo Course Notes M1 2020 Dr Mbulli
No ratings yet
Bioinfo Course Notes M1 2020 Dr Mbulli
56 pages
Exercise 7 Bioinformatics
No ratings yet
Exercise 7 Bioinformatics
8 pages
Bioinformatics Database and Applications
100% (3)
Bioinformatics Database and Applications
82 pages
Bioinformatics
No ratings yet
Bioinformatics
22 pages
Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"
100% (3)
Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"
23 pages
Biology 171L - General Biology Lab I Lab 12: Introduction To Bioinformatics
No ratings yet
Biology 171L - General Biology Lab I Lab 12: Introduction To Bioinformatics
6 pages
Bio PPT
No ratings yet
Bio PPT
35 pages
Introduction To Different Resources of Bioinformatics and Application PDF
No ratings yet
Introduction To Different Resources of Bioinformatics and Application PDF
55 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
Lab 1 - Introduction and Protocol
No ratings yet
Lab 1 - Introduction and Protocol
28 pages
Using BLAST: FASTA Format
0% (1)
Using BLAST: FASTA Format
3 pages
Using Genbank and BLAST in The Biology Classroom: Matt Wester
No ratings yet
Using Genbank and BLAST in The Biology Classroom: Matt Wester
9 pages
UNIT IV _ BLAST (1)
No ratings yet
UNIT IV _ BLAST (1)
21 pages
Blast
No ratings yet
Blast
6 pages
List of Online Bioinformatics Tools and Software - Final
No ratings yet
List of Online Bioinformatics Tools and Software - Final
23 pages
IBB.MB.501 Database search and sequence alignment
No ratings yet
IBB.MB.501 Database search and sequence alignment
51 pages
Some Significant Databases Blast Blast
No ratings yet
Some Significant Databases Blast Blast
18 pages
Ncbi Blast Name: Rohith ND Roll No:20054
No ratings yet
Ncbi Blast Name: Rohith ND Roll No:20054
11 pages
Bioinfo Final Practical
No ratings yet
Bioinfo Final Practical
66 pages
BTH 403-BTG407 PRACTICAL SESSION1
No ratings yet
BTH 403-BTG407 PRACTICAL SESSION1
12 pages
BI205 Prac 5&6
No ratings yet
BI205 Prac 5&6
11 pages
Practical 2 sequence alignment
No ratings yet
Practical 2 sequence alignment
8 pages
Bs982 l08 Basic Blast
No ratings yet
Bs982 l08 Basic Blast
38 pages
بحث المعلوماتية الحيوية
No ratings yet
بحث المعلوماتية الحيوية
39 pages
Diploma - Practical
No ratings yet
Diploma - Practical
11 pages
Databases
No ratings yet
Databases
2 pages
Bioinformatic Tools and Resources
No ratings yet
Bioinformatic Tools and Resources
17 pages
Bioinformatics 3 vedant
No ratings yet
Bioinformatics 3 vedant
7 pages
GlOsario Bioinformatica
No ratings yet
GlOsario Bioinformatica
5 pages
Bi205: Genetics & Evolution: Bioinformatics 1 & 2
No ratings yet
Bi205: Genetics & Evolution: Bioinformatics 1 & 2
14 pages
Bio Intro
No ratings yet
Bio Intro
32 pages
NCBI Genome
No ratings yet
NCBI Genome
37 pages
Genomic Databases - Analysis Tools
No ratings yet
Genomic Databases - Analysis Tools
87 pages
latthika ppt[1]
No ratings yet
latthika ppt[1]
21 pages
SequenceAlignmentWebLinks
No ratings yet
SequenceAlignmentWebLinks
4 pages
BLAST
No ratings yet
BLAST
30 pages
Lecture - 02 - Comparative Sequence Analysis
No ratings yet
Lecture - 02 - Comparative Sequence Analysis
28 pages
Bio Informatics
No ratings yet
Bio Informatics
46 pages
Bioinformatics Cheat Sheet
No ratings yet
Bioinformatics Cheat Sheet
4 pages
Structure and Function of Sars-Cov-2 Spike Protein: A Multiple Sequence Alignment (Msa) Study
No ratings yet
Structure and Function of Sars-Cov-2 Spike Protein: A Multiple Sequence Alignment (Msa) Study
11 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
Linux Services Deployment
From Everand
Linux Services Deployment
Fabian Mestre
No ratings yet
Learning Linux Binary Analysis: Learning Linux Binary Analysis
From Everand
Learning Linux Binary Analysis: Learning Linux Binary Analysis
Ryan "elfmaster" O'Neill
4/5 (1)
Microbiology &amp Pathology Nuggets
100% (5)
Microbiology &amp Pathology Nuggets
158 pages
Interchange4thEd IntroLevel Unit12 Listening Worksheet
No ratings yet
Interchange4thEd IntroLevel Unit12 Listening Worksheet
2 pages
UCM224C
No ratings yet
UCM224C
8 pages
Maintenance Manual For Brake of Geared Traction Machine: - 1-D55006-B Issued in July 2020
No ratings yet
Maintenance Manual For Brake of Geared Traction Machine: - 1-D55006-B Issued in July 2020
68 pages
Comparison en 10277
No ratings yet
Comparison en 10277
4 pages
CHAPTER 7 The Muscular System
No ratings yet
CHAPTER 7 The Muscular System
6 pages
Hong Kong College of Pathologists: Clinical Microbiology/Virology Training Log Book
No ratings yet
Hong Kong College of Pathologists: Clinical Microbiology/Virology Training Log Book
20 pages
CMC Chapter 12
No ratings yet
CMC Chapter 12
96 pages
11Solid Wires for MIG-MAG Welding
No ratings yet
11Solid Wires for MIG-MAG Welding
5 pages
The Rich Little Poor Boy by Gates, Eleanor, 1875-1951
100% (1)
The Rich Little Poor Boy by Gates, Eleanor, 1875-1951
221 pages
Itinerary of TRavel BHW
No ratings yet
Itinerary of TRavel BHW
2 pages
HTM 06-02
No ratings yet
HTM 06-02
114 pages
Bitemark Handout
100% (1)
Bitemark Handout
7 pages
Ielts 8-3
No ratings yet
Ielts 8-3
2 pages
Chemical Formula
100% (2)
Chemical Formula
6 pages
Properties of Water 5
No ratings yet
Properties of Water 5
3 pages
Mine Geology Short Course PPT 050517
No ratings yet
Mine Geology Short Course PPT 050517
177 pages
Consumer Preferences of Fast Food Outlets in Roxas City
No ratings yet
Consumer Preferences of Fast Food Outlets in Roxas City
7 pages
Voltage Drop Chart: Volt Meter Before Burying and Finalizing Your Project
No ratings yet
Voltage Drop Chart: Volt Meter Before Burying and Finalizing Your Project
1 page
Material Properties - Soil Treatment - Magnasol - cn2
100% (1)
Material Properties - Soil Treatment - Magnasol - cn2
2 pages
Parasitology Lec 3.01a Intestinal Nematodes
No ratings yet
Parasitology Lec 3.01a Intestinal Nematodes
16 pages
Dr. Walda Powell, Dr. Cassandra Lilly, Dr. Karthik Aghoram: Shefali Srivastava and Jocelyn Towe
No ratings yet
Dr. Walda Powell, Dr. Cassandra Lilly, Dr. Karthik Aghoram: Shefali Srivastava and Jocelyn Towe
1 page
Unit 8 hw499 Assignment Resistant Training
100% (1)
Unit 8 hw499 Assignment Resistant Training
10 pages
Verb Gerund or Infinitive
No ratings yet
Verb Gerund or Infinitive
2 pages
Natural Cosmetics: The Evolution of Color
83% (6)
Natural Cosmetics: The Evolution of Color
16 pages
Mental Models Checklist
No ratings yet
Mental Models Checklist
1 page
Psychological Tests
No ratings yet
Psychological Tests
3 pages
UNICEF WHO WB Global Expanded Databases Stunting April 2021
No ratings yet
UNICEF WHO WB Global Expanded Databases Stunting April 2021
1,032 pages
Unit 6 Assignment
No ratings yet
Unit 6 Assignment
4 pages

Bio Tools Booklet

Uploaded by

Bio Tools Booklet

Uploaded by

BB2440 Bioinformatics and Biostatistics - HT 2020

Bioinformatics Tools Booklet

There is a seemingly endless supply of bioinformatics tools. The field of

Useful commands in the terminal

$ prodigal -i input_file -d nucleotide_output_file -a aminoacid_output_file

Prodigal is available on Bioconda. To install it, run

$ conda install -c bioconda prodigal

Online BLAST: https://blast.ncbi.nlm.nih.gov

$ conda install -c bioconda blast

$ conda install -c bioconda trnascan-se

Barrnap is available on Bioconda. To install it, run

$ conda install -c bioconda barrnap

Multiple sequence aligners

RDP Classifier: http://rdp.cme.msu.edu/classifier/classifier.jsp

RDP Hierarchy browser: http://rdp.cme.msu.edu/hierarchy/hb_intro.jsp

Picard Tools: https://broadinstitute.github.io/picard/command-line-overview.html

SAM tools: http://samtools.sourceforge.net/samtools.shtml

You might also like