{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T10:12:36Z","timestamp":1761559956350},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"S2","license":[{"start":{"date-parts":[[2005,7,1]],"date-time":"2005-07-01T00:00:00Z","timestamp":1120176000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/2.zoppoz.workers.dev:443\/http\/creativecommons.org\/licenses\/by\/2.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2005,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Clustering the ESTs from a large dataset representing a single species is a convenient starting point for a number of investigations into gene discovery, genome evolution, expression patterns, and alternatively spliced transcripts. Several methods have been developed to accomplish this, the most widely available being UniGene, a public domain collection of gene-oriented clusters for over 45 different species created and maintained by NCBI. The goal is for each cluster to represent a unique gene, but currently it is not known how closely the overall results represent that reality. UniGene's build procedure begins with initial mRNA clusters before joining ESTs. UniGene's results for soybean indicate a significant amount of redundancy among some sequences reported to be unique mRNAs. To establish a valid non-redundant known gene set for<jats:italic>Glycine max<\/jats:italic>we applied our algorithm to the clustering of only mRNA sequences. The mRNA dataset was run through the algorithm using two different matching stringencies. The resulting cluster compositions were compared to each other and to UniGene. Clusters exhibiting differences among the three methods were analyzed by 1) nucleotide and amino acid alignment and 2) submitting authors conclusions to determine whether members of a single cluster represented the same gene or not.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Of the 12 clusters that were examined closely most contained examples of sequences that did not belong in the same cluster. However, neither the two stringencies of PECT nor UniGene had a significantly greater record of accuracy in placing paralogs into separate clusters.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>Our results reveal that, although each method produces some errors, using multiple stringencies for matching or a sequential hierarchical method of increasing stringencies can provide more reliable results and therefore allow greater confidence in the vast majority of clusters that contain only ESTs and no mRNA sequences.<\/jats:p><\/jats:sec>","DOI":"10.1186\/1471-2105-6-s2-s7","type":"journal-article","created":{"date-parts":[[2005,7,16]],"date-time":"2005-07-16T09:00:41Z","timestamp":1121504441000},"update-policy":"https:\/\/2.zoppoz.workers.dev:443\/http\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Evaluation of Glycine max mRNA clusters"],"prefix":"10.1186","volume":"6","author":[{"given":"Ronald L","family":"Frank","sequence":"first","affiliation":[]},{"given":"Fikret","family":"Ercal","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2005,7,15]]},"reference":[{"key":"665_CR1","doi-asserted-by":"crossref","first-page":"524","DOI":"10.1101\/gr.8.5.524","volume":"8","author":"D Gautheret","year":"1998","unstructured":"Gautheret D, Poirot O, Lopez F, Audic S, Claverie J: Alternate polyadenylation in Human mRNAs: A large-scale analysis by EST clustering. Genome Research 1998, 8: 524\u2013530.","journal-title":"Genome Research"},{"key":"665_CR2","first-page":"79","volume":"6","author":"G Wistow","year":"2000","unstructured":"Wistow G, Sardarian L, Gan W, Wyatt K: The human gene for \u03b3 S-crystallin:Alternate transcripts and expressed sequences from the first intron. Molecular Vision 2000, 6: 79\u201384.","journal-title":"Molecular Vision"},{"key":"665_CR3","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1093\/nar\/30.1.186","volume":"30","author":"Y Huang","year":"2002","unstructured":"Huang Y, Chen Y, Lai J, Yang S, Yang U: PALS db: Putative alternative splicing database. Nucleic Acids Research 2002, 30: 186\u2013190. 10.1093\/nar\/30.1.186","journal-title":"Nucleic Acids Research"},{"key":"665_CR4","doi-asserted-by":"publisher","first-page":"615","DOI":"10.1089\/dna.2004.23.615","volume":"23","author":"R Mudhireddy","year":"2004","unstructured":"Mudhireddy R, Ercal F, Frank R: Parallel hash-based EST clustering algorithm for gene sequencing. DNA and Cell Biology 2004, 23: 615\u2013623. 10.1089\/dna.2004.23.615","journal-title":"DNA and Cell Biology"},{"key":"665_CR5","doi-asserted-by":"publisher","first-page":"693","DOI":"10.1139\/g02-032","volume":"45","author":"C Granger","year":"2002","unstructured":"Granger C, Coryell V, Khanna A, Keim P, Vodkin L, Shoemaker R: Identification, structure, and differential expression of members of a BURP domain containing protein family in soybean. Genome 2002, 45: 693\u2013701. 10.1139\/g02-032","journal-title":"Genome"},{"key":"665_CR6","doi-asserted-by":"publisher","first-page":"8245","DOI":"10.1093\/nar\/10.24.8245","volume":"10","author":"MA Schuler","year":"1982","unstructured":"Schuler MA, Ladin BF, Pollaco JC, Freyer G, Beachy RN: Structural sequences are conserved in the genes coding for the alpha, alpha' and beta-subunits of the soybean 7S seed storage protein. Nucleic Acids Res 1982, 10: 8245\u20138261.","journal-title":"Nucleic Acids Res"},{"key":"665_CR7","doi-asserted-by":"publisher","first-page":"1071","DOI":"10.1093\/nar\/25.5.1071","volume":"25","author":"AJ McCullough","year":"1997","unstructured":"McCullough AJ, Schuler MA: Intronic and exonic sequences modulate 5' splice site selection in plant nuclei. Nucleic Acids Res 1997, 25: 1071\u20131077. 10.1093\/nar\/25.5.1071","journal-title":"Nucleic Acids Res"},{"key":"665_CR8","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1111\/j.1432-1033.1996.0221t.x","volume":"241","author":"AD Shutov","year":"1996","unstructured":"Shutov AD, Kakhovskaya IA, Bastrygina AS, Bulmaga VP, Horstmann C, Muntz K: Limited proteolysis of beta-conglycinin and glycinin, the 7S and 11S storage globulins from soybean [Glycine max (L.) Merr.]. Structural and evolutionary implications. Eur J Biochem 1996, 241: 221\u2013228. 10.1111\/j.1432-1033.1996.0221t.x","journal-title":"Eur J Biochem"},{"key":"665_CR9","doi-asserted-by":"publisher","first-page":"854","DOI":"10.1046\/j.1432-1327.1998.2580854.x","volume":"258","author":"N Maruyama","year":"1998","unstructured":"Maruyama N, Katsube T, Wada Y, Oh MH, Barba De La Rosa AP, Okuda E, Nakagawa S, Utsumi S: The roles of the N-linked glycans and extension regions of soybean beta-conglycinin in folding, assembly and structural features. Eur J Biochem 1998, 258: 854\u2013862. 10.1046\/j.1432-1327.1998.2580854.x","journal-title":"Eur J Biochem"},{"key":"665_CR10","doi-asserted-by":"publisher","first-page":"5040","DOI":"10.1073\/pnas.82.15.5040","volume":"82","author":"T Nguyen","year":"1985","unstructured":"Nguyen T, Zelechowska M, Foster H, Bergmann H, Verma DP: Primary structure of the soybean noduli-35 gene encoding uricase II localized in the peroxisomes of uninfected cells of nodules. Proc Natl Acad Sci USA 1985, 82: 5040\u20135044. 10.1073\/pnas.82.15.5040","journal-title":"Proc Natl Acad Sci USA"},{"key":"665_CR11","doi-asserted-by":"publisher","first-page":"384","DOI":"10.1104\/pp.95.2.384","volume":"95","author":"H Suzuki","year":"1991","unstructured":"Suzuki H, Verma D: Soybean nodule-specific uricase (Nodulin-35) is expressed and assembled into a functional tetrameric holoenzyme in Escherichia coli. Plant Physiol 1991, 95: 384\u2013389.","journal-title":"Plant Physiol"},{"key":"665_CR12","doi-asserted-by":"publisher","first-page":"1338","DOI":"10.1093\/nar\/19.6.1338","volume":"19","author":"JE Bergmann","year":"1991","unstructured":"Bergmann JE, Preddie E, Cortes L, Brousseau R: A protein drp90 encoded on the leftwards strand of soybean nodule urate oxidase cDNA binds to a regulatory sequence in leghemoglobin C3 gene. Nucleic Acids Res 1991, 19: 1338.","journal-title":"Nucleic Acids Res"},{"key":"665_CR13","doi-asserted-by":"publisher","first-page":"661","DOI":"10.1104\/pp.103.2.661","volume":"103","author":"M Chatfield","year":"1993","unstructured":"Chatfield M, Dalton DA: Ascorbate peroxidase from soybean root nodules. Plant Physiol 1993, 103: 661\u2013662. 10.1104\/pp.103.2.661","journal-title":"Plant Physiol"},{"key":"665_CR14","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1016\/S1016-8478(23)13525-4","volume":"9","author":"SC Lee","year":"1999","unstructured":"Lee SC, Kang BG, Oh SE: Induction of ascorbate peroxidase by ethylene and hydrogen peroxide during growth of cultured soybean cells. Mol Cells 1999, 9: 166\u2013171.","journal-title":"Mol Cells"},{"key":"665_CR15","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1016\/0014-5793(85)80886-3","volume":"188","author":"T Momma","year":"1985","unstructured":"Momma T, Negoro T, Udaka K, Fukazawa C: A complete cDNA coding for the sequence of glycinin A2B1a subunit precursor. FEBS Lett 1985, 188: 117\u2013122. 10.1016\/0014-5793(85)80886-3","journal-title":"FEBS Lett"},{"key":"665_CR16","doi-asserted-by":"publisher","first-page":"6719","DOI":"10.1093\/nar\/13.18.6719","volume":"13","author":"T Negoro","year":"1985","unstructured":"Negoro T, Momma T, Fukazawa C: A cDNA clone encoding a glycinin A1a subunit precursor of soybean. Nucleic Acids Res 1985, 13: 6719\u20136731.","journal-title":"Nucleic Acids Res"},{"key":"665_CR17","doi-asserted-by":"publisher","first-page":"3267","DOI":"10.1271\/bbb1961.51.3267","volume":"51","author":"S Utsumi","year":"1987","unstructured":"Utsumi S, Kim C, Kohno M, Kito M: Polymorphism and expression of cDNAs encoding glycinin subunits. Agric Biol Chem 1987, 51: 3267\u20133273.","journal-title":"Agric Biol Chem"},{"key":"665_CR18","doi-asserted-by":"publisher","first-page":"210","DOI":"10.1021\/jf00074a011","volume":"35","author":"S Utsumi","year":"1987","unstructured":"Utsumi S, Kohno M, Mori T, Kito M: An alternate cDNA encoding glycinin A-1a-B-x subunit. J Agric Food Chem 1987, 35: 210\u2013214. 10.1021\/jf00074a011","journal-title":"J Agric Food Chem"},{"key":"665_CR19","doi-asserted-by":"publisher","first-page":"456","DOI":"10.1007\/BF00280303","volume":"230","author":"E Bell","year":"1991","unstructured":"Bell E, Mullet JE: Lipoxygenase gene expression is modulated in plants by water deficit, wounding, and methyl jasmonate. Mol Gen Genet 1991, 230: 456\u2013462. 10.1007\/BF00280303","journal-title":"Mol Gen Genet"},{"key":"665_CR20","doi-asserted-by":"publisher","first-page":"1319","DOI":"10.1105\/tpc.7.8.1319","volume":"7","author":"TW Bunker","year":"1995","unstructured":"Bunker TW, Koetje DS, Stephenson LC, Creelman RA, Mullet JE, Grimes HD: Sink limitation induces the expression of multiple soybean vegetative lipoxygenase mRNAs while the endogenous jasmonic acid level remains low. Plant Cell 1995, 7: 1319\u20131331. 10.1105\/tpc.7.8.1319","journal-title":"Plant Cell"},{"key":"665_CR21","doi-asserted-by":"publisher","first-page":"287","DOI":"10.1104\/pp.110.1.287","volume":"110","author":"DM Saravitz","year":"1996","unstructured":"Saravitz DM, Siedow JN: The differential expression of wound-inducible lipoxygenase genes in soybean leaves. Plant Physiol 1996, 110: 287\u2013299. 10.1104\/pp.110.1.287","journal-title":"Plant Physiol"},{"key":"665_CR22","doi-asserted-by":"publisher","first-page":"909","DOI":"10.1007\/BF00019389","volume":"14","author":"BW Shirley","year":"1990","unstructured":"Shirley BW, Ham DP, Senecoff JF, Berry-Lowe SL, Zurfluh LL, Shah DM, Meagher RB: Comparison of the expression of two highly homologous members of the soybean ribulose-1, 5-bisphosphate carboxylase small subunit gene family. Plant Mol Biol 1990, 14: 909\u2013925. 10.1007\/BF00019389","journal-title":"Plant Mol Biol"},{"key":"665_CR23","doi-asserted-by":"publisher","first-page":"8222","DOI":"10.1073\/pnas.88.18.8222","volume":"88","author":"AM Lescure","year":"1991","unstructured":"Lescure AM, Proudhon D, Pesey H, Ragland M, Theil EC, Briat JF: Ferritin gene transcription is regulated by iron in soybean cell cultures. Proc Natl Acad Sci U S A 1991, 88: 8222\u20138226. 10.1073\/pnas.88.18.8222","journal-title":"Proc Natl Acad Sci U S A"},{"key":"665_CR24","doi-asserted-by":"crossref","first-page":"18339","DOI":"10.1016\/S0021-9258(17)44757-0","volume":"265","author":"M Ragland","year":"1990","unstructured":"Ragland M, Briat JF, Gagnon J, Laulhere JP, Massenet O, Theil EC: Evidence for conservation of ferritin sequences among plants and animals and for a transit peptide in soybean. J Biol Chem 1990, 265: 18339\u201318344.","journal-title":"J Biol Chem"},{"key":"665_CR25","doi-asserted-by":"publisher","first-page":"1025","DOI":"10.1104\/pp.103.3.1025","volume":"103","author":"A Vazquez-Tello","year":"1993","unstructured":"Vazquez-Tello A, Whittier RF, Kawasaki T, Sugimoto T, Kawamura Y, Shibata D: Sequence of a soybean (Glycine max L.) phosphoenolpyruvate carboxylase cDNA. Plant Physiol 1993, 103: 1025\u20131026. 10.1104\/pp.103.3.1025","journal-title":"Plant Physiol"},{"key":"665_CR26","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1046\/j.1365-313X.1998.00022.x","volume":"13","author":"S Hata","year":"1998","unstructured":"Hata S, Izui K, Kouchi H: Expression of a soybean nodule-enhanced phosphoenolpyruvate carboxylase gene that shows striking similarity to another gene for a house-keeping isoform. Plant J 1998, 13: 267\u2013273. 10.1046\/j.1365-313X.1998.00022.x","journal-title":"Plant J"},{"key":"665_CR27","doi-asserted-by":"publisher","first-page":"2078","DOI":"10.1104\/pp.104.042762","volume":"135","author":"S Sullivan","year":"2004","unstructured":"Sullivan S, Jenkins GI, Nimmo HG: Roots, cycles and leaves. Expression of the phosphoenolpyruvate carboxylase kinase gene family in soybean. Plant Physiol 2004, 135: 2078\u20132087. 10.1104\/pp.104.042762","journal-title":"Plant Physiol"},{"key":"665_CR28","doi-asserted-by":"publisher","first-page":"441","DOI":"10.1046\/j.1365-313X.2003.01740.x","volume":"34","author":"W Xu","year":"2003","unstructured":"Xu W, Zhou Y, Chollet R: Identification and expression of a soybean nodule-enhanced PEP-carboxylase kinase gene (NE-PpcK) that shows striking up-\/down-regulation in vivo. Plant J 2003, 34: 441\u2013452. 10.1046\/j.1365-313X.2003.01740.x","journal-title":"Plant J"},{"key":"665_CR29","doi-asserted-by":"publisher","first-page":"404","DOI":"10.1007\/BF00281790","volume":"242","author":"RS Torisky","year":"1994","unstructured":"Torisky RS, Griffin JD, Yenofsky RL, Polacco JC: A single gene (Eu4) encodes the tissue-ubiquitous urease of soybean. Mol Gen Genet 1994, 242: 404\u2013414. 10.1007\/BF00281790","journal-title":"Mol Gen Genet"},{"key":"665_CR30","doi-asserted-by":"publisher","first-page":"1801","DOI":"10.1104\/pp.103.022699","volume":"132","author":"A Goldraij","year":"2003","unstructured":"Goldraij A, Beamer LJ, Polacco JC: Interallelic complementation at the ubiquitous urease coding locus of soybean. Plant Physiol 2003, 132: 1801\u20131810. 10.1104\/pp.103.022699","journal-title":"Plant Physiol"},{"key":"665_CR31","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1007\/BF00330430","volume":"208","author":"BJ Scallon","year":"1987","unstructured":"Scallon BJ, Dickinson CD, Nielsen NC: Characterization of a null-allele for the Gy4 glycinin gene from soybean. Mol Gen Genet 1987, 208: 107\u2013113. 10.1007\/BF00330430","journal-title":"Mol Gen Genet"},{"key":"665_CR32","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1111\/j.1574-6968.1999.tb13575.x","volume":"174","author":"A Tatiana","year":"1999","unstructured":"Tatiana A, Tatusova , Thomas L, Madden : Blast 2 sequences \u2013 a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett 1999, 174: 247\u2013250. 10.1016\/S0378-1097(99)00149-4","journal-title":"FEMS Microbiol Lett"},{"key":"665_CR33","doi-asserted-by":"publisher","first-page":"1135","DOI":"10.1101\/gr.9.11.1135","volume":"9","author":"J Burke","year":"1999","unstructured":"Burke J, Davison D, Hide W: d2_cluster: A validated method for clustering EST and full-length cDNA sequences. Genome Research 1999, 9: 1135\u20131142. 10.1101\/gr.9.11.1135","journal-title":"Genome Research"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/2.zoppoz.workers.dev:443\/http\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-S2-S7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/2.zoppoz.workers.dev:443\/http\/link.springer.com\/article\/10.1186\/1471-2105-6-S2-S7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-S2-S7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,27]],"date-time":"2024-01-27T22:37:05Z","timestamp":1706395025000},"score":1,"resource":{"primary":{"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-6-S2-S7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,7]]},"references-count":33,"journal-issue":{"issue":"S2","published-print":{"date-parts":[[2005,7]]}},"alternative-id":["665"],"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1186\/1471-2105-6-s2-s7","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,7]]},"assertion":[{"value":"15 July 2005","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S7"}}