{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T23:27:28Z","timestamp":1772753248271,"version":"3.50.1"},"reference-count":47,"publisher":"Wiley","issue":"13","license":[{"start":{"date-parts":[[2005,9,1]],"date-time":"2005-09-01T00:00:00Z","timestamp":1125532800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/2.zoppoz.workers.dev:443\/http\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Am. Soc. Inf. Sci."],"published-print":{"date-parts":[[2005,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>A potentially useful feature of information retrieval systems for students is the ability to identify documents that not only are relevant to the query but also match the student's reading level. Manually obtaining an estimate of reading difficulty for each document is not feasible for very large collections, so we require an automated technique. Traditional readability measures, such as the widely used Flesch\u2010Kincaid measure, are simple to apply but perform poorly on Web pages and other nontraditional documents. This work focuses on building a broadly applicable statistical model of text for different reading levels that works for a wide range of documents. To do this, we recast the well\u2010studied problem of readability in terms of text categorization and use straightforward techniques from statistical language modeling. We show that with a modified form of text categorization, it is possible to build generally applicable classifiers with relatively little training data. We apply this method to the problem of classifying Web pages according to their reading difficulty level and show that by using a mixture model to interpolate evidence of a word's frequency across grades, it is possible to build a classifier that achieves an average root mean squared error of between one and two grade levels for 9 of 12 grades. Such classifiers have very efficient implementations and can be applied in many different scenarios. The models can be varied to focus on smaller or larger grade ranges or easily retrained for a variety of tasks or populations.<\/jats:p>","DOI":"10.1002\/asi.20243","type":"journal-article","created":{"date-parts":[[2005,9,1]],"date-time":"2005-09-01T18:46:30Z","timestamp":1125600390000},"page":"1448-1462","source":"Crossref","is-referenced-by-count":86,"title":["Predicting reading difficulty with statistical language models"],"prefix":"10.1002","volume":"56","author":[{"given":"Kevyn","family":"Collins\u2010Thompson","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jamie","family":"Callan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"311","published-online":{"date-parts":[[2005,9]]},"reference":[{"key":"e_1_2_9_2_1","first-page":"77","volume-title":"Comprehension and teaching","author":"Anderson R.C.","year":"1981"},{"key":"e_1_2_9_3_1","unstructured":"Andrews L.W.(2001).Writing well about health."},{"key":"e_1_2_9_3_2","unstructured":"Retrieved July 2002 fromhttps:\/\/2.zoppoz.workers.dev:443\/http\/linda\u2010andrews.com\/readability_tool.htm"},{"key":"e_1_2_9_4_1","doi-asserted-by":"publisher","DOI":"10.2307\/747021"},{"key":"e_1_2_9_5_1","volume-title":"Word frequency book","author":"Carroll J.B.","year":"1971"},{"key":"e_1_2_9_6_1","volume-title":"Readability: An appraisal of research and application (Bureau of Educational Research Monographs, No. 34, Columbus: Ohio State University Press)","author":"Chall J.S.","year":"1958"},{"key":"e_1_2_9_7_1","volume-title":"Stages of reading development","author":"Chall J.S.","year":"1983"},{"key":"e_1_2_9_8_1","volume-title":"Readability revisited: The New Dale\u2010Chall Readability Formula","author":"Chall J.S.","year":"1995"},{"key":"e_1_2_9_9_1","doi-asserted-by":"publisher","DOI":"10.2307\/747339"},{"key":"e_1_2_9_10_1","unstructured":"Crawford W.J. King C.E. Brophy J.E. &Evertson C.M.(1975).Error rates and question difficulty related to elementary children's learning (Report No. 75\u20108). Austin: Universitiy of Texas at Austin Research and Development Center for Teacher Education."},{"key":"e_1_2_9_11_1","volume-title":"The living word vocabulary","author":"Dale E.","year":"1981"},{"key":"e_1_2_9_12_1","doi-asserted-by":"publisher","DOI":"10.1177\/107769906304000207"},{"key":"e_1_2_9_13_1","unstructured":"DeVries H.(1999).Reading Ease@WWW. MSc Research Report. Macquarie University Sydney Australia."},{"key":"e_1_2_9_13_2","unstructured":"Retrieved fromhttps:\/\/2.zoppoz.workers.dev:443\/http\/web.archive.org\/web\/20001212150500\/https:\/\/2.zoppoz.workers.dev:443\/http\/www.shlrc.mq.edu.au\/\u02dchdevries\/RE.html"},{"key":"e_1_2_9_14_1","volume-title":"Pattern classification (2nd ed.)","author":"Duda R.O.","year":"2001"},{"key":"e_1_2_9_15_1","doi-asserted-by":"publisher","DOI":"10.1080\/01638539809545029"},{"key":"e_1_2_9_16_1","first-page":"594","article-title":"A readability formula for short passages","author":"Fry E.","year":"1990","journal-title":"Journal of Reading"},{"key":"e_1_2_9_17_1","doi-asserted-by":"publisher","DOI":"10.1080\/09296179508590051"},{"key":"e_1_2_9_18_1","unstructured":"Gilutz S. &Nielsen J.(2002).Kids' corner: Website usability for children. AlertBox column (April 14 2002)."},{"key":"e_1_2_9_18_2","unstructured":"Retrieved July 2002 fromhttps:\/\/2.zoppoz.workers.dev:443\/http\/www.useit.com\/alertbox\/20020414.html"},{"key":"e_1_2_9_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-21606-5"},{"key":"e_1_2_9_20_1","unstructured":"Kincaid J. Fishburne R. Rodgers R. &Chissom B.(1975).Derivation of new readability formulas for navy enlisted personnel (Branch Report 8\u201375). Millington TN: Chief of Naval Training."},{"key":"e_1_2_9_21_1","volume-title":"The measurement of readability","author":"Klare G.R.","year":"1963"},{"key":"e_1_2_9_22_1","first-page":"1137","volume-title":"A study of cross\u2010validation and bootstrap for accuracy estimation and model selection","author":"Kohavi R.","year":"1995"},{"key":"e_1_2_9_23_1","doi-asserted-by":"publisher","DOI":"10.1037\/0033-295X.104.2.211"},{"key":"e_1_2_9_24_1","unstructured":"Makuta M.H.(1998).A computational model of lexical cohesion analysis and its application to the evaluation of text coherence. Unpublished Ph.D. thesis University of Waterloo Waterloo Ontario Canada."},{"key":"e_1_2_9_25_1","first-page":"359","volume-title":"Improving text classification by shrinkage in a hierarchy of classes","author":"McCallum A.","year":"1998"},{"issue":"2","key":"e_1_2_9_26_1","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1111\/j.2517-6161.1980.tb01109.x","article-title":"Regression models for ordinal data","volume":"42","author":"McCullagh P.","year":"1980","journal-title":"Journal of the Royal Statistical Society B"},{"key":"e_1_2_9_27_1","unstructured":"Melamed I.D.(1995).Automatic evaluation and uniform filter cascades for inducing n\u2010best translation lexicons. In D. Yarovsky & K. Church (Eds.) Proceedings of the Third Workshop on Very Large Corpora (pp.184\u2013198). Association for Computational Linguistics."},{"key":"e_1_2_9_28_1","volume-title":"The ninth mental measurements yearbook","author":"Mitchell J.V.","year":"1985"},{"key":"e_1_2_9_29_1","volume-title":"Feature selection for classification based on text hierarchy","author":"Mladenic D.","year":"1998"},{"key":"e_1_2_9_30_1","doi-asserted-by":"crossref","unstructured":"Morales L.(2001).Assessing patient experiences with Healthcare in Multi\u2010Cultural Settings. RAND Corporation.","DOI":"10.7249\/RGSD157"},{"key":"e_1_2_9_30_2","unstructured":"Retrieved July 2002 fromhttps:\/\/2.zoppoz.workers.dev:443\/http\/www.rand.org\/publications\/RGSD\/RGSD157\/RGSD157.ch3.pdf"},{"key":"e_1_2_9_31_1","doi-asserted-by":"crossref","first-page":"792","DOI":"10.21236\/ADA350490","volume-title":"Learning to classify text from labeled and unlabeled documents. Proceedings of the 15th National Conference on Artificial Intelligence","author":"Nigam K.","year":"1998"},{"key":"e_1_2_9_32_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb046814"},{"key":"e_1_2_9_33_1","unstructured":"ReadingA\u2010Z.com.(2003).Reading A\u2010Z Leveling and Correlation Chart."},{"key":"e_1_2_9_33_2","unstructured":"Retrieved May 2003 fromhttps:\/\/2.zoppoz.workers.dev:443\/http\/www.readinga\u2010z.com\/newfiles\/correlate.html"},{"key":"e_1_2_9_34_1","unstructured":"Ryan K.(2001).The Lingua::Fathom module: Perl documentation:"},{"key":"e_1_2_9_34_2","unstructured":"Retrieved July 2002 fromhttps:\/\/2.zoppoz.workers.dev:443\/http\/aspn.activestate.com\/ASPN\/CodeDoc\/Lingua\u2010EN\u2010Fathom\/Fathom.html"},{"key":"e_1_2_9_35_1","doi-asserted-by":"crossref","unstructured":"Si L. &Callan J.(2001).A statistical model for scientific readability. Proceedings of the 10th International Conference on Information and Knowledge Management (CIKM 2001) (pp.574\u2013576). Atlanta.","DOI":"10.1145\/502585.502695"},{"key":"e_1_2_9_36_1","volume-title":"Effective schools and classrooms","author":"Squires D.A.","year":"1983"},{"key":"e_1_2_9_37_1","unstructured":"Stenner A.J.(1996).Measuring reading comprehension with the Lexile framework. Durham NC: MetaMetrics."},{"key":"e_1_2_9_37_2","unstructured":"Retrieved June 2003 fromhttps:\/\/2.zoppoz.workers.dev:443\/http\/www.lexile.com\/about_lex\/tech\u2010papers\/documents\/Measure.pdf"},{"key":"e_1_2_9_38_1","volume-title":"The Lexile framework","author":"Stenner A.J.","year":"1988"},{"key":"e_1_2_9_39_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1745-3984.1983.tb00209.x"},{"key":"e_1_2_9_40_1","volume-title":"A teacher's word book of 10,000 words","author":"Thorndike E.L.","year":"1921"},{"key":"e_1_2_9_41_1","first-page":"356","article-title":"A note on the statistical analysis of sentence length as a criterion of literary style","volume":"31","author":"Williams C.B.","year":"1940","journal-title":"Biometrika"}],"container-title":["Journal of the American Society for Information Science and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fasi.20243","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/asi.20243","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,3]],"date-time":"2025-01-03T21:59:30Z","timestamp":1735941570000},"score":1,"resource":{"primary":{"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/onlinelibrary.wiley.com\/doi\/10.1002\/asi.20243"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,9]]},"references-count":47,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2005,11]]}},"alternative-id":["10.1002\/asi.20243"],"URL":"https:\/\/2.zoppoz.workers.dev:443\/https\/doi.org\/10.1002\/asi.20243","archive":["Portico"],"relation":{},"ISSN":["1532-2882","1532-2890"],"issn-type":[{"value":"1532-2882","type":"print"},{"value":"1532-2890","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,9]]}}}