Unit2 2
Unit2 2
The idea of block is derived from the more familiar notion of a motif,
which usually refers to a conserved stretch of amino acids that
confer a specific function or structure of protein.
When these individual motifs from proteins in the same family can
Be aligned without introducing a gap, the result is a BLOCK
With these protein blocks in hand, it was then possible to look for
substitution patterns only in the most conserved regions of a protein,
the regions that were least prone to change.
Pij is the probability of any amino acid that is replaced by any other amino acid
qi, qj are the background probabilties of finding the amino acids i,j in any protein
sequences
Selecting an appropriate Scoring Matrix
Matrix Best Use Similarity
(%)
G + Ln
E = kmNe-s
K is a minor constant
http://www.ncbi.nlm.nih.gov/books/NBK62051/
PSI-BLAST
http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml
Sequence
Position w/in query Position w/in DB Offset (Q minus DB)
(total of 256)
TCTC 1,3 2,4 -1 or -3 or 1
CTCT 2 3 -1
TTCT 1
FASTA Steps
Different offset values
1 2
Identical offset
Diagonals are extended
values in a
contiguous sequence
3 4
2. Longest diagonals are scored again using the PAM matrix (or
other matrix). The best scores are saved as init1 scores.
4. Long diagonals that are neighbors are joined. The score for
this joined region is initn. This score may be lower due to a
penalty for a gap.
The higher the ktup value the less likely you will get a
match unless it is identical (remember the dot plots)
The lower the ktup value the more background you will
have
( scores) 2
Stand. Dev. = scores2 -
Total#ofSequences
Total#ofSequences
FASTA Statistics
Using the distribution of the z-scores in the database, the
FastA program can estimate the number of sequences that
would be expected to produce, purely by chance, a z-score
greater than or equal to the z-score obtained in the search.
Good
SCORES Init1: 719 Initn: 748 Opt: 793
z-score: 734.0 E(): 3.8e-34
Smith-Waterman score: 796; 41.3% identity in 378 overlap
Mediocre
SCORES Init1: 249 Initn: 304 Opt: 260
z-score: 243.2 E(): 8.3e-07
Smith-Waterman score: 270; 35.0% identity in 183 overlap
When to use the correct program
Problem Program Explanation
Identify BLASTP; General protein
comparison. Use ktup=2
Unknown FASTA3
for speed; ktup=1 for
Protein sensitive search.
BLASTX-nucleotide query-protein DB