Bio Info Merged
Bio Info Merged
bioinformatics
• Previously by direct sequencing of proteins; but now by translation of DNA sequences…new protein
sequence data are determined; but the later is always hypothetical unless experimentally proven
• The pattern recognition programs that can do it will be subject to 3 types of errors:
• 1. protein sequences might be missed entirely, may incorrectly spliced, might be from exons in
different ways of combination and in different tissues can’t be predicted. If mRNA is edited before
translation, it can’t be known.
• 2. No clue for quarternary structure, prosthetic group binding, patterns of disulphide bridges
• 3. Post translational modifications: covalent alterations within a cell, addition of a ligand, or cleavage of
a protein to an active form
• Inteins are proteins that have self splicing activity compared to done by proteases
Transcriptomics
• Many RNA transcripts are not protein coding
• Transient: mRNA
• Stable: rRNA
• 1. RNA seq methods by RNA to cDNA and sequencing methods or real time PCR
• 2. RNA can be sequenced directly
BLAST = Basic local alignment
search tool
• Rapidly compare a query sequence to a database of subject sequences
• Generate alignments between them= the quality of which is by ALIGNMENT SCORE
• Return alignments that pass user defined score and statistical significance thresholds
• BLAST uses local alignment to find high scoring segment pairs (HSP) between two
sequences
• BLAST HIT: A Subject sequence that is aligned to the query
https://blast.ncbi.nlm.nih.gov/Blast.cgi
Common blast programs
BLASTN is for:
• Mapping oligonucleotides to genome
• Comparing DNA from closely related species
• Aligning expressed sequence tags to a genome
BLASTP is for:
• Exploring protein function
• Initial discovery for conserved domains
BLASTX
• Nucleotide query is translated into all 6 reading frames
• 3 reading frames in + strand
• 3 reading frames in – strand
• Each reading frame is compared to a protein database
• It is used for:
• Gene finding in genomic DNA (Annotations)
• Annotating ESTs
TBLASTN
• Query is a protein sequence
• Nucleotide database is translated into 6 RFs
• The query is then compared to each RF
• It is used for
• Mapping a protein to genome database
• Finding ESTs that map to a protein sequence
• Finding RNA Seq reads that map to a protein sequence
TBLASTX
• BOTH query and subject database (both are nucleotides) are converted to 6 RFs and
compared
• It is best used for:
• Comparing the nucleotide sequence from distantly related species
• Identify coding regions in ESTs
• Sensitive but expensive
HOUSE-KEEPING GENES to read too!!
Polymerase
Chain
Reaction
(PCR)
• PCR
• History of PCR
• Thermal cycler
• Components of PCR
• Three basic steps
• PCR program in thermal cycler
• General guidelines for primer.
• Application of PCR
• Advantages and disadvantages of PCR
What is PCR?
Why “Polymerase”?
Why “Chain”?
Principle of PCR
• Great mind behind this PCR : was an American biochemist Kary Banks Mullis
• Developed PCR in 1985 and was awarded the Nobel Prize in Chemistry in 1993 for his pioneering work.
• -1983-Kary Mullis, a scientist working for the Cetus Corporation was driving along US Route 101 in northern
California when he came up with the idea for the polymerase chain reaction.
• ln 1985 Cetus Corp. Scientists isolate Thermostable Taq Polymerase (from T. aquaticus), which revolutionized
• Cetus rewarded Kary Mullis with a $10,000 bonus for his invention.
• Later, during a corporate reorganization, Cetus sold the patent for the PCR process to a pharmaceutical company
Heat the reaction strongly to separate, or denature, the DNA strands. This provides single-
stranded template for the next step.
Cool the reaction so the primers can bind to their complementary sequences on the single-
stranded template DNA.
Raise the reaction temperatures so Taq polymerase extends the primers, synthesizing new strands
of DNA.
Denaturation
• The reaction mixture is heated to a temperature between 90-98º C so that the ds DNA
is denatured into single strands by disrupting the hydrogen bonds between
complementary bases.
• Duration of this step is 1-2 mins.
• Temperature: 92-94C.
• Double stranded DNA melts → single stranded DNA.
Annealing
Temperature of reaction mixture is cooled to 45-60º C
• Primers are jiggling around caused by ???????
• Primers base pair with the complementary sequence in the DNA.
• Hydrogen bonds reform.
• Annealing fancy word for renaturing.
Temperature: ~45-70C (dependant on the melting temperature of the expected
duplex).
Extension
DNA polymerase binds to the annealed primers and extends DNA at the 3' end of the chain
The temperature is now shifted to 72º C which is ideal for polymerase.
Primers are extended by joining the bases complementary to DNA strands.
➤Elongation step continues where the polymerase adds dNTP's from 5' to 3', reading the template from 3' to 5' side, bases are
added complementary to the template.
➤Now first cycle is over and next cycle is continued,as PCR machine is automated thermocycler the same cycle is repeated
upto 30-40 times.
Temperature: ~72C
Time: 0.5-3min
General Guidelines for primers
1. Length:
• Shorter primers have a tendency to go and anneal to the non-target sequence of the
DNA template.
• Short primer may offer sufficient for a simple template such as a small plasmid.
• But a long primer may be required when using eukaryotic genomic DNA as
template. In practice, 20-30 nucleotides is generally satisfactory.
2. Mismatches:
• Do not need to match the template completely.
• Often beneficial to have C or G as the 3' terminal nucleotide which makes the
binding of the 3' end of the primer to the template more stable.
General Guidelines for primers
3. Melting Temperature Tm:
Melting temperature is the temperature at which one half of the DNA duplex
will dissociated and become single stranded. Typically the annealing
temperature is about 3-5 degrees Celsius bellow the Tm of the primers used.
Primers with melting temperatures in the range of 52-58°C generally produce
the best results. Primers with melting temperatures above 65°C have a
tendency for secondary annealing.
Tm can be calculated from the following formula:
Tm= (4 X [G+C]) + 2 x [A+T])
General Guidelines for primers
4. Internal Secondary Structure:
Should be avoided in order to prevent the primer to fold back on itself and not
be available to bind to the template.
5. Primer-Primer Annealing:
Also important to avoid the two primers being able to anneal to each other.
Extension by DNA polymerase of two self-annealed primers leads to
formation of a primer dimer.
6. G/C content:
Ideally a primer should have a near random mix of nucleotides, a 50% G/C
content.
Templates for PCR
• Body Fluids (Blood, CSF, Synovial, Sputum, Semen, Menstrual blood, Stool, Urine etc).
• Tissues
• Dried blood
• Semen stains
• Vaginal swabs
• Single hair
• Fingernail scrapings
• Insects in Amber
• Egyptian mummies
• Buccal Swab
• Toothbrushes
• Microorganisms (Bacteria, Fungi, Virus etc)
Things to try if PCR does not work
A) If no product (of correct size) produced:
1 Check DNA quality.
2 Reduce annealing temperature.
3 Increase magnesium concentration.
4 Add dimethylsulphoxide (DMSO) to assay (at around 10%).
5 Use different thermostable enzyme.
6 Throw out primers - make new stocks.
B) If extra spurious product bands present:
1 Increase annealing temperature
2 Reduce magnesium concentration
3 Reduce number of cycles
4 Try different enzyme
Variations of the PCR
Colony PCR
Nested PCR
Multiplex PCR
AFLP PCR (Amplified fragment length polymorphism)
Hot Start PCR
In Situ PCR
Inverse PCR
Asymmetric PCR
Reverse Transcriptase PCR
Allele specific PCR
Real time/Qunatitative PCR
ARMS PCR (Amplification Refractory Mutation System)
Methyl Specific PCR
TaqManTM Sequence Detection
- Several types of chemistries have been System
developed for this direct detection of PCR-
copied sequences. One of the most
popular is the TaqManTM system illustrated
here.
* * * * *
- The quantitative capability of this
system stems from the direct correlation
that has been shown between the
starting number of target sequence
copies in the sample and the number of
amplification cycles required for the
instrument to first detect an increase in
reporter dye fluorescence associated
with the generation of new copies.
- Information from the standard curve and results from a single calibrator sample
containing known target cell numbers - that is extracted and run with the test
samples - can also be used to determine target cell numbers in the test samples
using a simple calculation called the comparative cycle threshold method as
illustrated here.
Application of PCR
Medical Application
• Genetic testing for presence of genetic disease mutations.
• Detection of disease causing genes in suspected parents who act as carrier.
• Study of alteration to oncogenes may help in customization of therapy.
• Can also be used as part of a sensitive test for tissue typing, vital to organ
transplantation genotyping of embryo.
• Helps to monitor the gene therapy.
Infectious Disease Application
• Analyzing clinical specimens for the presence of infectious agents, including HIV,
hepatitis, malaria, tuberculosis etc.
• Detection of new virulent subtypes of organism that is responsible for epidemics
Application of PCR
Forensic Application
• Can be used as a tool in genetic fingerprinting.
• This technology can identify any one person from millions of others in case of
crime scene, paternity testing etc.
Research and Molecular Genetics
• Helps to compare the genomes of two organisms and identify the difference
between them.
• In phylogenetic analysis, minute quantities of DNA from any source such a
fossilized material, hair, bones, mummified tissues.
• In Human genome project for aim to complete mapping and understanding of all
genes of human beings.
Advantages of PCR
Automated, fast, reliable (reproducible) results.
Contained (less chances of contamination).
• High output.
Sensitive.
Broad uses.
Defined, easy to follow protocols.
More Cycles = More DNA
Sample problem: PCR in forensics
Suppose that you are working in a forensics lab. You have just received a DNA sample from a hair left at
a crime scene, along with DNA samples from three possible suspects. Your job is to examine a
particular genetic marker and see whether any of the three suspects matches the hair DNA for this
marker.
The marker comes in two alleles, or versions. One contains a single repeat (brown region below), while
the other contains two copies of the repeat. In a PCR reaction with primers that flank the repeat region,
the first allele produces a 200 bp DNA fragment, while the second produces a 300 bp DNA fragment:
You perform PCR on the four DNA samples and visualize the results by gel electrophoresis, as shown
below:
Which suspect's DNA matches the DNA from the crime scene at this marker?
Suspect 3
• Humans are diploid, meaning that they have two copies of most of their DNA. Thus, there will be two copies of the marker
we are examining in each of the DNA samples.
• If a person has two different alleles of the marker (is heterozygous), two different-sized bands ( 200 bp and 300 bp) will be
amplified during PCR. These will appear as bands of DNA in the gel at the 200 bp and 300 bp locations.
• If a person has two copies of the same allele (is homozygous), only one band will be amplified during PCR. If the person is
homozygous for the 200 bp allele, only a 200 bp band will be visible on the gel. Similarly, if the person is homozygous for
the 300 bp allele, only a 300 bp band will be visible on the gel.
• Suspect 2: heterozygous
• Both the DNA sample from the crime scene and the DNA sample from suspect are homozygous for the 200 bp version of
the marker. That is, the two samples match for this marker.
(A) Gradient temperature control of one single block with heating
and cooling elements at each end. (B) Applied Biosystems
VeriFlex Block with “better-than-gradient” temperature control,
featuring three separate independent blocks and individual
heating and cooling elements for each block.
Difference between primers and probes
Primers are the starting point of the polymerase chain reaction (PCR) with single-stranded DNA/RNA.
Primers are usually designed to bind to a specific DNA/RNA sequence. In the cell, RNA primers are
the starting point of DNA replication.
qPCR probes describe DNA sequences similar to primers that are typically labelled with a fluorophore
as signalling molecules (molecular marker). The use of these probes allows for the quantification of
specific DNA sequences present in a sample (image 1).
Make sure it’s the right length
The specificity of a primer depends on its length.
• Primers, such as PCR primers, should be designed with a length of 18 to 24 nucleotides for ideal
amplification.
• Long primers have a slower hybridisation rate and a lower chance of annealing to the intended target
sequence.
• Slower hybridisation produces inadequate specificity and inadequate binding to the target sequence,
whereas faster hybridisation rates result in high target concentrations and maximum binding to the target
sequence.
The risk of a slower hybridisation increases when a primer is longer than 30 base pairs.
Even though long primers have a higher level of specificity than short primers, they are less efficient during the
annealing phase and produce less amplicon yield. Due to the build-up of by-products and the loss of components
necessary for DNA synthesis during a PCR, more cycles can result in less efficient outcomes. Short primers,
however, anneal to their target sequence more effectively and need fewer PCR cycles for amplicon generation
compared to long primers (Wu et al., 2010).
Unlike primers, the optimal length of probes is highly target-specific. The length generally selected by experts is
between 15 and 30 nucleotides. However, when longer probes are used instead of shorter ones, fewer probes per
gene may be required.
Moreover, Tm also determines the annealing temperature Ta, where primer binding occurs at the highest
efficiency and specificity. Ta affects the final amount of PCR and qPCR product yield.
The optimal melting temperature for maintenance of primer specificity is 54°C or higher (54°C to 65°C).
However, Ta of a primer is often above its Tm, usually in a range of 2-5°C.
When a primer is designed, its Tm should not be above 65°C, as it increases the risk of secondary annealing.
• The reason for this GC content range is simple. When primers and probes anneal to their target sequence, GC
base pairs form three hydrogen bonds and adenine (A) and thymine (T) form two hydrogen bonds (image).
• As three hydrogens bonds are stronger than two, the
separation of G and C requires more energy (in the form of
heat) than for A and C.
• However, secondary structures can be avoided by adjusting the annealing temperature (in most cases, an
increase in temperature), avoiding cross homology, and changing the primers or DNA concentration. The
DNA concentration should be balanced with the number of cycles required in the reaction to enable the best
possible results.
• Runs of three or more Cs or Gs at the 3'-ends of primers may promote mispriming at G or C-rich sequences
(because of stability of annealing), and should be avoided;
https://www.bioinformatics.nl/cgi-
bin/emboss/dotmatcher
Arrangement of domains as described in
Swiss-Plot entry
Hyderabad
Kanyakumari
6+6 only!!
However, detailed algorithm can be read from the book!!
Multiple sequence alignment and
database searching
• Searching a database for homologues of known protein is a central theme of
bioinformatics
• 3 imp methods are there
• A. Profiles
• B. PSI-BLAST
• C. Hidden Markov models (HMM)
• THE goal is to find high sensitivity or high specificity sequences to find.
A. Profiles
• It express the patterns inherent in a MSA of a set of homologous sequences
• They help in following:
• A. greater accuracy in alignment of distantly related sequences
• B. set of residues that are highly conserved are likely to be part of active site and give
clues to function
• C. Identification of other homologous sequences
• D. set of residues which are of little conservation are in surface loops and used for
vaccine design
• E. Most structure prediction methods rely on the profiles
Matching MSA of thioredoxins
from 25-30 position…
Many-to-one test One probe with To find the query oligo Northern or southern
complementary sequence in a mixture, spread the blot
mixture out and test
each component of the
mixture
Many-to-many test A set of oligos are To detect many oligos Microarrays where
synthesized one in a mixture, they are DNA oligomers are
complementary to each prepared with different affixed to known
sequence of query colored fluorescent locations on a rigid
tags for different support in a regular 2D
• DNA microarrays analyze
• 1. the mRNAs in a cell to reveal the expression patterns of proteins; or
• 2. genomic DNA to reveal absent or mutated genes
https://microbenotes.com/dna-microarray/
• By image processing, checking internal
controls, dealing with missing data, selecting
reliable measurements, putting the results in
consistent scales
• Change by 1.5-2 is considered significant in
each row or column, considered as vector
• Two approaches for analysis:
• 1. comparisons focused on genes by
comparing rows
• 2. comparison focused on different samples
by comparing columns
https://hbctraining.github.io/Intro-to-rnaseq-hpc-O2/lessons/05_counting_reads.html
• About GEO2R: https://www.ncbi.nlm.nih.gov/geo/info/geo2r.html
• geo2R: https://www.ncbi.nlm.nih.gov/geo/geo2r/
• Volcano plot: to see differentially expressed plots by plotting statistically significant
changes vs differentially expressed plots
• Mean difference plot: see differentially expressed plots