16S rRNA Refseq V15.22 Genomic RefSeq V9.14
HOMD User Documentation     
Table of Content:
User Documentation
+Taxon Description
+Identify 16S rRNA Sequence
+Tools & Download
 >Genomics Tools Overview
 >HOMD Genome Viewer
 >HOMD JBrowse Genome Viewer
 >Dynamic Genome Annotation
 +Blast against Genomes
  >BLAST databases
  >BLAST Search main parameters
  >FASTA format description
  >Full list of the BLAST Advanced options
  >Out-Of-Frame BLAST notation
  >BLAST Color schema description
  >Programs available for the BLAST search
  >Low complexity filtering
 >HOMD Dynamic Genomic BLAST
 >KEGG Pathway
 >Gene Ontology
 >Sequence Batch Download
+Database Search
General Documentation
+HOMD Information
Search Documentation:
Input Keywords:
Title Content
4-91] Matrix
A key element in evaluating the quality of a pairwise sequence alignment is the "substitution matrix", which assigns a score for aligning any possible pair of residues. The theory of amino acid substitution matrices is described in [1], and applied to DNA sequence comparison in [2]. In general, different substitution matrices are tailored to detecting similarities among sequences that are diverged by differing degrees [1-3]. A single matrix may nevertheless be reasonably efficient over a relatively broad range of evolutionary change [1-3]. Experimentation has shown that the BLOSUM-62 matrix [4] is among the best for detecting most weak protein similarities. For particularly long and weak alignments, the BLOSUM-45 matrix may prove superior. A detailed statistical theory for gapped alignments has not been developed, and the best gap costs to use with a given substitution matrix are determined empirically. Short alignments need to be relatively strong (i.e. have a higher percentage of matching residues) to rise above background noise. Such short but strong alignments are more easily detected using a matrix with a higher "relative entropy" [1] than that of BLOSUM-62. In particular, short query sequences can o­nly produce short alignments, and therefore database searches with short queries should use an appropriately tailored matrix. The BLOSUM series does not include any matrices with relative entropies suitable for the shortest queries, so the older PAM matrices [5,6] may be used instead. For proteins, a provisional table of recommended substitution matrices and gap costs for various query lengths is:
Query length Substitution matrix Gap costs------------ ------------------- ---------<35 PAM-30 ( 9,1)35-50 PAM-70 (10,1)50-85 BLOSUM-80 (10,1)>85 BLOSUM-62 (11,1)
Gap CostsThe raw score of an alignment is the sum of the scores for aligning pairs of residues and the scores for gaps. Gapped BLAST and PSI-BLAST use "affine gap costs" which charge the score -a for the existence of a gap, and the score -b for each residue in the gap. Thus a gap of k residues receives a total score of -(a+bk); specifically, a gap of length 1 receives the score -(a+b).
Lambda RatioTo convert a raw score S into a normalized score S' expressed in bits, o­ne uses the formula S' = (lambda*S - ln K)/(ln 2), where lambda and K are parameters dependent upon the scoring system (substitution matrix and gap costs) employed [7-9]. For determining S', the more important of these parameters is lambda. The "lambda ratio" quoted here is the ratio of the lambda for the given scoring system to that for o­ne using the same substitution scores, but with infinite gap costs [8]. This ratio indicates what proportion of information in an ungapped alignment must be sacrificed in the hope of improving its score through extension using gaps. We have found empirically that the most effective gap costs tend to be those with lambda ratios in the range 0.8 to 0.9.
  • [1] Altschul, S.F. (1991) "Amino acid substitution matrices from an information theoretic perspective." J. Mol. Biol. 219:555-565.
  • [2] States, D.J., Gish, W. & Altschul, S.F. (1991) "Improved sensitivity of nucleic acid database searches using application-specific scoring matrices." Methods 3:66-70.
  • [3] Altschul, S.F. (1993) "A protein alignment scoring system sensitive at all evolutionary distances." J. Mol. Evol. 36:290-300.
  • [4] Henikoff, S. & Henikoff, J.G. (1992) "Amino acid substitution matrices from protein blocks." Proc. Natl. Acad. Sci. USA 89:10915-10919.
  • [5] Dayhoff, M.O., Schwartz, R.M. & Orcutt, B.C. (1978) "A model of evolutionary change in proteins." In "Atlas of Protein Sequence and Structure, vol. 5, suppl. 3," M.O. Dayhoff (ed.), pp. 345-352, Natl. Biomed. Res. Found., Washington, DC.
  • [6] Schwartz, R.M. & Dayhoff, M.O. (1978) "Matrices for detecting distant relationships." In "Atlas of Protein Sequence and Structure, vol. 5, suppl. 3," M.O. Dayhoff (ed.), pp. 353-358, Natl. Biomed. Res. Found., Washington, DC.
  • [7] Karlin, S. & Altschul, S.F. (1990) "Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes." Proc. Natl. Acad. Sci. USA 87:2264-2268.
  • [8] Altschul, S.F. & Gish, W. (1996) "Local alignment statistics." Meth. Enzymol. 266:460-480.**
  • [9] Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997) "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res. 25:3389-3402.
  • Article last modified on 2014-04-08 11:01:10 by lyang; viewed 2162 times; Category: User Documentation; Topic: Tools & Downloa
    Copyright 2007-2022 The Forsyth Institute
    Hosted on Amazon AWS EC2