0% found this document useful (0 votes)
14 views112 pages

INFORMATION RETRIEVAL A Biomedical and Health Perspective 4th Edition William Hersh Full Chapters Instanly

Study material: INFORMATION RETRIEVAL a biomedical and health perspective 4th Edition William Hersh Download instantly. A complete academic reference filled with analytical insights and well-structured content for educational enrichment.

Uploaded by

madonnama0176
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views112 pages

INFORMATION RETRIEVAL A Biomedical and Health Perspective 4th Edition William Hersh Full Chapters Instanly

Study material: INFORMATION RETRIEVAL a biomedical and health perspective 4th Edition William Hersh Download instantly. A complete academic reference filled with analytical insights and well-structured content for educational enrichment.

Uploaded by

madonnama0176
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 112

INFORMATION RETRIEVAL a biomedical and health

perspective 4th Edition William Hersh digital


version 2025

Order directly from textbookfull.com


https://textbookfull.com/product/information-retrieval-a-biomedical-
and-health-perspective-4th-edition-william-hersh/

★★★★★
4.7 out of 5.0 (66 reviews )

Download PDF Now


INFORMATION RETRIEVAL a biomedical and health perspective
4th Edition William Hersh

TEXTBOOK

Available Formats

■ PDF eBook Study Guide Ebook

EXCLUSIVE 2025 ACADEMIC EDITION – LIMITED RELEASE

Available Instantly Access Library


More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Introduction to Information Retrieval Manning

https://textbookfull.com/product/introduction-to-information-
retrieval-manning/

Health Care Information Systems: A Practical Approach


for Health Care Management 4th Edition Karen A. Wager

https://textbookfull.com/product/health-care-information-systems-
a-practical-approach-for-health-care-management-4th-edition-
karen-a-wager/

Health Economics An International Perspective 4th


Edition Barbara Mcpake

https://textbookfull.com/product/health-economics-an-
international-perspective-4th-edition-barbara-mcpake/

Mobile Information Retrieval 1st Edition Prof. Fabio


Crestani

https://textbookfull.com/product/mobile-information-
retrieval-1st-edition-prof-fabio-crestani/
Experiment and Evaluation in Information Retrieval
Models 1st Edition K. Latha

https://textbookfull.com/product/experiment-and-evaluation-in-
information-retrieval-models-1st-edition-k-latha/

Biomedical Information Technology Biomedical


Engineering 2nd Edition David Dagan Feng (Editor)

https://textbookfull.com/product/biomedical-information-
technology-biomedical-engineering-2nd-edition-david-dagan-feng-
editor/

Information Retrieval Technology: 14th Asia Information


Retrieval Societies Conference, AIRS 2018, Taipei,
Taiwan, November 28-30, 2018, Proceedings Yuen-Hsien
Tseng
https://textbookfull.com/product/information-retrieval-
technology-14th-asia-information-retrieval-societies-conference-
airs-2018-taipei-taiwan-november-28-30-2018-proceedings-yuen-
hsien-tseng/

HIMSS Dictionary of Health Information Technology


Terms, Acronyms, and Organizations 4th Edition Coll.

https://textbookfull.com/product/himss-dictionary-of-health-
information-technology-terms-acronyms-and-organizations-4th-
edition-coll/

Information Retrieval Technology 11th Asia Information


Retrieval Societies Conference AIRS 2015 Brisbane QLD
Australia December 2 4 2015 Proceedings 1st Edition
Guido Zuccon
https://textbookfull.com/product/information-retrieval-
technology-11th-asia-information-retrieval-societies-conference-
airs-2015-brisbane-qld-australia-
Health Informatics

William Hersh

Information
Retrieval:
A Biomedical
and Health
Perspective
Fourth Edition
Health Informatics
This series is directed to healthcare professionals leading the transformation of
healthcare by using information and knowledge. For over 20 years, Health
Informatics has offered a broad range of titles: some address specific professions
such as nursing, medicine, and health administration; others cover special areas of
practice such as trauma and radiology; still other books in the series focus on
interdisciplinary issues, such as the computer based patient record, electronic health
records, and networked healthcare systems. Editors and authors, eminent experts in
their fields, offer their accounts of innovations in health informatics. Increasingly,
these accounts go beyond hardware and software to address the role of information
in influencing the transformation of healthcare delivery systems around the world.
The series also increasingly focuses on the users of the information and systems: the
organizational, behavioral, and societal changes that accompany the diffusion of
information technology in health services environments.
Developments in healthcare delivery are constant; in recent years, bioinformatics
has emerged as a new field in health informatics to support emerging and ongoing
developments in molecular biology. At the same time, further evolution of the field
of health informatics is reflected in the introduction of concepts at the macro or
health systems delivery level with major national initiatives related to electronic
health records (EHR), data standards, and public health informatics.
These changes will continue to shape health services in the twenty-first century.
By making full and creative use of the technology to tame data and to transform
information, Health Informatics will foster the development and use of new
knowledge in healthcare.

More information about this series at http://www.springer.com/series/1114


William Hersh

Information Retrieval:
A Biomedical and Health
Perspective
Fourth Edition
William Hersh
Oregon Health & Science University
Portland, OR
USA

ISSN 1431-1917     ISSN 2197-3741 (electronic)


Health Informatics
ISBN 978-3-030-47685-4    ISBN 978-3-030-47686-1 (eBook)
https://doi.org/10.1007/978-3-030-47686-1

© Springer Nature Switzerland AG 2020


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, express or implied, with respect to the material contained herein or for any errors
or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims
in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To Sally, Becca, Alyssa, and AJ
Preface

The main goal of this book is to provide an understanding of the theory, implemen-
tation, and evaluation of information retrieval (IR) systems in biomedicine and
health. There is already a great deal of “how-to” information on searching for bio-
medical and health information (some listed in Chap. 1). Similarly, there are also a
number of high-quality basic IR textbooks (also listed in Chap. 1). This volume is
different from all of the above in that it covers basic IR as do the latter books, but
with a distinct focus on the biomedical and health domain.
The first three editions of this book were published in 1996, 2003, and 2009.
Although subsequent editions of books in many fields represent incremental
updates, this edition is profoundly rewritten and is essentially a new book. The IR
world has changed substantially since I wrote the first three editions of the book. At
the time of the first edition, IR systems were available and not too difficult to access
if you had the means and expertise. Also, in that edition, the Internet was a “special
topic” in the very last chapter of the book. By the second edition, the World Wide
Web had become a widespread platform for the use of information access and deliv-
ery, but had not achieved the nearly ubiquitous and saturated use it has now. At
present, however, not only must health care professionals and biomedical research-
ers understand how to use IR systems to be effective in their work, but patients and
consumers must also as well to attain optimal health care.
Similar to previous editions will be the maintenance of a Web site for errata and
updates. The Website http://www.irbook.info/ will identify all errors in the book
text as well as provide updates on important new findings in the field as they become
available.
As in the first three editions, the approach is still to introduce all the necessary
theory to allow coverage of the implementation and evaluation of IR systems in
biomedicine and health. Any book on theoretical aspects must necessarily use tech-
nical jargon, and this book is no exception. Although jargon is minimized, it cannot
be eliminated without retreating to a more superficial level of coverage. The read-
er’s understanding of the jargon will vary based on their background, but anyone
with some background in computers, libraries, health, and/or biomedicine should be

vii
viii Preface

able to understand most of the terms used. In any case, an attempt to define all jar-
gon terms is made.
Another approach is to attempt wherever possible to classify topics, whether
discussing types of information or models of evaluation. I have always found clas-
sification useful in providing an overview of complex topics. One problem, of
course, is that everything does not fit into the neat and simple categories of the clas-
sification. This occurs repeatedly with IR, and the reader is forewarned.
This book had its origins in a tutorial taught at the former Symposium on
Computer Applications in Medicine (SCAMC) meeting. The content continues to
grow each year through my course taught to biomedical informatics students in the
on-campus and disease-learning programs at OHSU. (Students often do not realize
that next year’s course content is based in part on the new and interesting things they
teach me!) The book can be used in either a basic information science course or a
biomedical and health informatics course. It should also provide a strong back-
ground for others interested in this topic, including those who design, implement,
use, and evaluate IR systems.
Interest continues to grow in biomedical and health IR systems. I entered a fel-
lowship in medical informatics at Harvard University in the late 1980s, during the
initial era of medical artificial intelligence. I had assumed I would take up the ban-
ner of some aspect of that area, such as knowledge representation. But along the
way I came across a reference from the field of “information retrieval.” It looked
interesting, so I looked at the references of that reference. It did not take long to
figure out that this was where my real interests lay, and I spent many an afternoon
in my fellowship tracing references in the Harvard University and Massachusetts
Institute of Technology libraries. Even though I had not yet heard of the field of
bibliometrics, I was personally validating all its principles. Like many in the field, I
have been amazed to see IR become so “mainstream” with its routine use by almost
everyone on the planet.
The book is divided into eight chapters. Chapter 1 provides basic definitions and
models that will be used throughout the book. It also points to resources for the field
and introduces evaluation of systems. Chapter 2 provides an overview of biomedical
and health information, describing some of the issues in its production, dissemina-
tion, and use. Chapter 3 gives an overview of the great deal of content that is cur-
rently available. Chapters 4 and 5 cover the two fundamental intellectual tasks of
IR, indexing and retrieval, with the predominant paradigms of each discussed in
detail. Chapter 6 discusses the methods and challenges of larger information access.
Chapter 7 focuses on evaluation research that has been done on state-of-the-art sys-
tems in the biomedical and health domain. Finally, Chapter 8 explores research
about IR systems and their users, with an emphasis on applications in the biomedi-
cal and health domain. Within each chapter, the goal is to provide a comprehensive
overview of the topic, with thorough citations of pertinent references. There is a
preference for discussing biomedical and health implementations of principles, but
where this is not possible, the original domain of implementation is discussed.
This book would not have been possible without the influence of various men-
tors, dating back to high school, who nurtured my interests in science generally
Preface ix

and/or biomedical and health informatics specifically, and/or helped me achieve my


academic and career goals. The most prominent include Mr. Robert Koonz (then of
New Trier West High School, Northfield, IL), Dr. Darryl Sweeney (then of University
of Illinois at Champaign-Urbana), Dr. Robert Greenes (then of Harvard Medical
School), Dr. David Evans (then of Carnegie Mellon University), Dr. Mark Frisse
(then of Washington University), Dr. J. Robert Beck (then of OHSU), Dr. David
Hickam (then of OHSU), Dr. Brian Haynes (McMaster University), Dr. Lesley
Hallick (then of OHSU), and Dr. Jerris Hedges (then of OHSU). I must also
acknowledge the late Dr. Gerard Salton (Cornell University), whose writings initi-
ated and sustained my interest in this field.
I would also like to note the contributions of institutions and people in the federal
government who aided the development of my career and this book. While many
Americans increasingly question the abilities of their government to do anything
successfully, the NLM, under the former directorship of the late Dr. Donald
A. B. Lindberg and the current directorship of Dr. Patricia Flatley Brennan, has led
the growth and advancement of the field of biomedical and health informatics. The
NLM’s fellowship and research funding have given me the skills and experience to
succeed in this field. I would also like to acknowledge the late Oregon Senator Mark
O. Hatfield through his dedication to biomedical research funding that aided myself
and many others.
Finally, this book also would not have been possible without the love and support
of my family. All of my parents, Mom and Jon, Dad and Gloria, as well as my
brother Jeff and sister-in-law Myra, supported the various interests I developed in
life and the somewhat different career path I chose. I think that now as they have
become Web users and searchers, they appreciate my interest in this area. And last,
but most importantly, has been the contribution of my wife, Sally, and two children,
Becca and Alyssa, whose unlimited love and support made this undertaking so
enjoyable and rewarding.

March 2020  William Hersh


Portland, OR
Contents

1 Foundations����������������������������������������������������������������������������������������������    1
1.1 Basic Definitions������������������������������������������������������������������������������    3
1.2 Scientific Disciplines Concerned with IR ����������������������������������������    5
1.3 Models of IR ������������������������������������������������������������������������������������    7
1.3.1 The Information World ��������������������������������������������������������    7
1.3.2 Users ������������������������������������������������������������������������������������    8
1.3.3 Health Decision-Making������������������������������������������������������    9
1.3.4 Knowledge Acquisition and Use������������������������������������������    9
1.4 IR Resources ������������������������������������������������������������������������������������   11
1.4.1 Organizations������������������������������������������������������������������������   11
1.4.2 Journals ��������������������������������������������������������������������������������   12
1.4.3 Texts��������������������������������������������������������������������������������������   13
1.4.4 Tools��������������������������������������������������������������������������������������   13
1.5 The Internet and World Wide Web����������������������������������������������������   14
1.5.1 Users ������������������������������������������������������������������������������������   15
1.5.2 Usage������������������������������������������������������������������������������������   16
1.5.3 Hypertext and Linking����������������������������������������������������������   18
1.6 Evaluation ����������������������������������������������������������������������������������������   19
1.6.1 Classification of Evaluation��������������������������������������������������   21
1.6.2 Relevance-Based Evaluation������������������������������������������������   24
1.6.3 Challenge Evaluations����������������������������������������������������������   30
References��������������������������������������������������������������������������������������������������   34
2 Information����������������������������������������������������������������������������������������������   41
2.1 What Is Information?������������������������������������������������������������������������   41
2.2 Theories of Information��������������������������������������������������������������������   42
2.3 Properties of Scientific Information��������������������������������������������������   45
2.3.1 Growth����������������������������������������������������������������������������������   45
2.3.2 Obsolescence������������������������������������������������������������������������   46
2.3.3 Fragmentation ����������������������������������������������������������������������   48

xi
xii Contents

2.3.4 Linkage and Citations ����������������������������������������������������������   48


2.3.5 Propagation ��������������������������������������������������������������������������   60
2.4 Classification of Health Information������������������������������������������������   61
2.5 Production of Biomedical and Health Information��������������������������   64
2.5.1 Generation of Scientific Information������������������������������������   64
2.5.2 Peer Review��������������������������������������������������������������������������   69
2.5.3 Primary Literature����������������������������������������������������������������   73
2.5.4 Systematic Reviews and Meta-Analysis ������������������������������   94
2.5.5 Secondary Literature������������������������������������������������������������   99
2.6 Electronic Publishing������������������������������������������������������������������������ 101
2.6.1 Electronic Scholarly Publication������������������������������������������ 102
2.6.2 Consumer Health Information���������������������������������������������� 103
2.7 Use of Knowledge-Based Health Information���������������������������������� 108
2.7.1 Models of Physician Thinking���������������������������������������������� 109
2.7.2 Physician Information Needs������������������������������������������������ 110
2.7.3 Information Needs of Other Healthcare Professionals �������� 115
2.7.4 Information Needs of Biomedical and Health Researchers�� 115
2.7.5 Information Needs of Consumers ���������������������������������������� 116
2.8 Summary ������������������������������������������������������������������������������������������ 116
References�������������������������������������������������������������������������������������������������� 116
3 Content������������������������������������������������������������������������������������������������������ 141
3.1 Classification of Health and Biomedical Information Content�������� 141
3.2 Bibliographic Content���������������������������������������������������������������������� 143
3.2.1 Literature Reference Databases�������������������������������������������� 144
3.2.2 Web Catalogs and Feeds ������������������������������������������������������ 153
3.2.3 Specialized Registries ���������������������������������������������������������� 154
3.3 Full-Text Content������������������������������������������������������������������������������ 155
3.3.1 Periodicals���������������������������������������������������������������������������� 156
3.3.2 Books and Reports���������������������������������������������������������������� 157
3.3.3 Web Collections�������������������������������������������������������������������� 159
3.4 Annotated Content���������������������������������������������������������������������������� 161
3.4.1 Images and Videos���������������������������������������������������������������� 162
3.4.2 Citations�������������������������������������������������������������������������������� 163
3.4.3 Evidence-Based Medicine Resources ���������������������������������� 163
3.4.4 Molecular Biology and -Omics�������������������������������������������� 166
3.4.5 Educational Resources���������������������������������������������������������� 169
3.4.6 Linked Data�������������������������������������������������������������������������� 170
3.4.7 Other Annotated Content������������������������������������������������������ 170
3.5 Aggregations ������������������������������������������������������������������������������������ 172
3.5.1 Consumer Health������������������������������������������������������������������ 172
3.5.2 Health Professionals������������������������������������������������������������� 173
3.5.3 Body of Knowledge�������������������������������������������������������������� 175
3.5.4 Model Organism Databases�������������������������������������������������� 176
3.5.5 Scientific Information ���������������������������������������������������������� 176
References�������������������������������������������������������������������������������������������������� 177
Contents xiii

4 Indexing���������������������������������������������������������������������������������������������������� 181
4.1 Types of Indexing������������������������������������������������������������������������������ 181
4.2 Factors Influencing Indexing������������������������������������������������������������ 182
4.3 Controlled Vocabularies�������������������������������������������������������������������� 183
4.3.1 General Principles of Controlled Vocabularies �������������������� 184
4.3.2 The Medical Subject Headings (MeSH) Vocabulary������������ 185
4.3.3 Other Indexing Vocabularies������������������������������������������������ 191
4.3.4 The Unified Medical Language System�������������������������������� 194
4.4 Manual Indexing ������������������������������������������������������������������������������ 197
4.4.1 Bibliographic Manual Indexing�������������������������������������������� 198
4.4.2 Full-Text Manual Indexing �������������������������������������������������� 200
4.4.3 Web Manual Indexing���������������������������������������������������������� 200
4.4.4 Limitations of Manual Indexing ������������������������������������������ 206
4.5 Automated Indexing�������������������������������������������������������������������������� 207
4.5.1 Word Indexing���������������������������������������������������������������������� 207
4.5.2 Limitations of Word Indexing���������������������������������������������� 207
4.5.3 Word Weighting�������������������������������������������������������������������� 208
4.5.4 Link-Based Indexing������������������������������������������������������������ 212
4.5.5 Web Crawling ���������������������������������������������������������������������� 213
4.6 Indexing Annotated Content ������������������������������������������������������������ 214
4.6.1 Index Imaging ���������������������������������������������������������������������� 214
4.6.2 Indexing Learning Objects���������������������������������������������������� 215
4.6.3 Indexing Biomedical and Health Data���������������������������������� 218
4.7 Data Structures for Efficient Retrieval���������������������������������������������� 218
References�������������������������������������������������������������������������������������������������� 220
5 Retrieval���������������������������������������������������������������������������������������������������� 225
5.1 Search Process���������������������������������������������������������������������������������� 226
5.2 General Principles of Searching�������������������������������������������������������� 226
5.2.1 Exact-Match Searching�������������������������������������������������������� 227
5.2.2 Partial-Match Searching�������������������������������������������������������� 229
5.2.3 Term Selection���������������������������������������������������������������������� 233
5.2.4 Other Attribute Selection������������������������������������������������������ 237
5.3 Searching Interfaces�������������������������������������������������������������������������� 237
5.3.1 Bibliographic������������������������������������������������������������������������ 237
5.3.2 Full Text�������������������������������������������������������������������������������� 249
5.3.3 Annotated������������������������������������������������������������������������������ 252
5.3.4 Aggregations ������������������������������������������������������������������������ 257
5.4 Document Delivery �������������������������������������������������������������������������� 257
5.5 Notification or Information Filtering������������������������������������������������ 258
References�������������������������������������������������������������������������������������������������� 259
6 Access�������������������������������������������������������������������������������������������������������� 261
6.1 Libraries�������������������������������������������������������������������������������������������� 261
6.1.1 Definitions and Functions of DLs ���������������������������������������� 263
6.2 Access to Content ���������������������������������������������������������������������������� 265
xiv Contents

6.2.1 Access to Individual Items���������������������������������������������������� 265


6.2.2 Access to Collections������������������������������������������������������������ 267
6.2.3 Access to Metadata �������������������������������������������������������������� 268
6.2.4 Integration with Other Applications�������������������������������������� 269
6.3 Copyright and Intellectual Property�������������������������������������������������� 272
6.3.1 Copyright and Fair Use�������������������������������������������������������� 273
6.3.2 Digital Rights Management�������������������������������������������������� 275
6.4 Open Access and Open Science�������������������������������������������������������� 276
6.4.1 Open-Access Publishing ������������������������������������������������������ 277
6.4.2 NIH Public Access Policy���������������������������������������������������� 278
6.4.3 Predatory Journals���������������������������������������������������������������� 280
6.5 Preservation�������������������������������������������������������������������������������������� 281
6.6 Librarians, Informationists, and Other Professionals ���������������������� 282
6.7 Future Directions������������������������������������������������������������������������������ 283
References�������������������������������������������������������������������������������������������������� 284
7 Evaluation������������������������������������������������������������������������������������������������ 289
7.1 Usage Frequency������������������������������������������������������������������������������ 290
7.2 Types of Usage���������������������������������������������������������������������������������� 292
7.3 User Satisfaction ������������������������������������������������������������������������������ 294
7.4 Searching Quality������������������������������������������������������������������������������ 294
7.4.1 System-Oriented Performance Evaluations�������������������������� 295
7.4.2 User-Oriented Performance Evaluations������������������������������ 299
7.5 Factors Associated with Success or Failure�������������������������������������� 310
7.5.1 Predictors of Success������������������������������������������������������������ 310
7.5.2 Analysis of Failure���������������������������������������������������������������� 314
7.6 Assessment of Impact ���������������������������������������������������������������������� 316
7.7 Research on Relevance �������������������������������������������������������������������� 319
7.7.1 Topical Relevance ���������������������������������������������������������������� 319
7.7.2 Situational Relevance������������������������������������������������������������ 320
7.7.3 Research About Relevance Judgments �������������������������������� 321
7.7.4 Limitations of Relevance-Based Measures�������������������������� 323
7.7.5 Automating Relevance Judgments���������������������������������������� 325
7.7.6 Measures of Agreement�������������������������������������������������������� 326
7.8 What Has Been Learned About IR Systems? ���������������������������������� 327
References�������������������������������������������������������������������������������������������������� 329
8 Research���������������������������������������������������������������������������������������������������� 337
8.1 Frameworks and Challenge Evaluations������������������������������������������ 337
8.2 Biomedical and Health IR Research ������������������������������������������������ 343
8.2.1 Early Studies ������������������������������������������������������������������������ 344
8.2.2 Challenge Evaluations in Biomedicine and Health�������������� 346
8.2.3 Ad Hoc Retrieval������������������������������������������������������������������ 353
8.2.4 Consumer-Oriented�������������������������������������������������������������� 355
8.2.5 Image Retrieval �������������������������������������������������������������������� 356
8.2.6 High-Recall Retrieval ���������������������������������������������������������� 357
8.2.7 EHR Retrieval ���������������������������������������������������������������������� 358
Contents xv

8.3 General IR Research ������������������������������������������������������������������������ 360


8.3.1 Overview of Early Research ������������������������������������������������ 360
8.3.2 Machine Learning: Uncovering Latent Meaning������������������ 363
8.3.3 Natural Language Processing ���������������������������������������������� 366
8.3.4 Question-Answering ������������������������������������������������������������ 368
8.3.5 Text Categorization �������������������������������������������������������������� 377
8.4 Research Systems and the User�������������������������������������������������������� 381
8.4.1 Early Research���������������������������������������������������������������������� 382
8.4.2 User Evaluation of Research Systems���������������������������������� 383
8.4.3 TREC Interactive Track�������������������������������������������������������� 384
8.5 Looking Forward������������������������������������������������������������������������������ 389
References�������������������������������������������������������������������������������������������������� 389

Index������������������������������������������������������������������������������������������������������������������ 407
Chapter 1
Foundations

The goal of this book is to present the field of information retrieval (IR), sometimes
called search, with an emphasis on the biomedical and health domain. To many,
“information retrieval” implies retrieving information of any type from a computer.
However, to those working in the field, IR has a different, more specific meaning,
which is the retrieval of information from databases that predominantly contain
textual information. A field at the intersection of information science and computer
science, IR concerns itself with the indexing and retrieval of information from het-
erogeneous and mostly textual information resources. The term was coined by
Mooers in 1951, who advocated that it be applied to the “intellectual aspects” of
description of information and systems for its searching [1].
The advancement of computer technology continues to alter the nature of IR. As
recently as the 1970s, Lancaster stated that an IR system does not inform the user
about a subject; it merely indicates the existence (or nonexistence) and whereabouts
of documents related to an information request [2]. At that time, of course, comput-
ers had considerably less power and storage than today’s personal computers, and
there was no Internet connecting the world’s computers and other information
devices to each other. In the 1970s, computers and network systems were only suf-
ficient to handle bibliographic databases, which contained just the title, source, and
a few indexing terms for documents. Furthermore, the high cost of computer hard-
ware and telecommunications usually made it prohibitively expensive for end users
to directly access such systems, so they had to submit requests that were run in
batches and returned hours to days later.
In the twenty-first century, however, the state of computers and IR systems is
much different, leading to new perspectives on the nature of the field [3, 4]. End-­
user access to massive amounts of information in databases and on the World Wide
Web is routine. A recent monograph traces the history of IR from early experiments
in the 1960s through the advent of ubiquitous search systems in the 2000s [5].
Not only can IR databases contain the full text of resources, but they may also
contain images, sounds, and even video sequences. Indeed, there is now the notion
of the digital library, where journals and books are mostly provided in digital form

© Springer Nature Switzerland AG 2020 1


W. Hersh, Information Retrieval: A Biomedical and Health Perspective,
Health Informatics, https://doi.org/10.1007/978-3-030-47686-1_1
2 1 Foundations

and library buildings are augmented by far-reaching computer networks [6, 7]. The
scientific publishing enterprise has been transformed to increasingly open science,
with access not only to research publications but also their underlying data [8]. New
models for delivering knowledge have been proposed, such as the Mobilizing
Computable Biomedical Knowledge initiative, whose manifesto calls for knowl-
edge to be provided in “computable formats that can be shared and integrated into
health information systems and applications” [9].
So transformative and ubiquitous has IR become that the name of the leading
Web search engine, Google, has entered the vernacular in a variety of ways, includ-
ing as a verb (i.e., using a search engine to look something up is called “Googling”)
[10]. The Google Trends1 (formerly Zeitgeist) keeps a tally of the world’s interests
as measured by what humans collectively type into the Google search engine. In
addition, some lament that the “Google generation,” i.e., today’s legions of technol-
ogy-savvy young people, are not critical enough in their skills regarding seeking,
synthesizing, and critically analyzing information [11].
One of the early motivations for IR systems was the ability to improve access to
information. Noting that the work of early geneticist Gregor Mendel was undiscov-
ered for nearly 30 years, Vannevar Bush called in the 1960s for science to create
better means of accessing scientific information [12]. In current times, there is equal
if not more concern with “information overload” and how to avoid missing impor-
tant information. A well-known example occurred when a patient who died in a
clinical trial in 2000 might have survived if information about the toxicity of the
agent being studied from the 1950s (before the advent of MEDLINE) had been
more readily accessible [13]. Indeed, a major challenge in IR is helping users find
“what they don’t know” [14].
Just how much information is out there? One analysis estimated the amount of
digital data in the world to be 33 zettabytes in 2018, with a projection to grow to 175
zettabytes by 2025 [15]. (A zettabyte is 1021 bytes, or one billion terabytes.) Another
report estimated the amount of computer network traffic to triple between 2017 and
2022 to 4.8 zettabytes per year, which would be about equal to all previous Internet
traffic from 1984 to 2018 [16]. Another analysis noted that 3.8 million searches of
Google are done every minute [17]. In the 2000s, Card published a figure comparing
the exponential growth of information as it surpassed the estimate of the size of all
documents created in human history (40,000 years), well above the estimated
amount of information a human could learn in a year (see Fig. 1.1) [18]. A current
estimate of the health information on a single human over their lifetime is over 1000
terabytes, with the major of information coming from social determinants of health
and health behaviors, and only a tiny fraction representing clinical data (<1 tera-
byte) [19]. Of course, only a small part of this data is the kind of text and other
information we might wish to retrieve using an IR system. Nonetheless, it was
estimated in 2016 that Google had indexed 130 trillion Web pages in its search
engine [20].

1
https://trends.google.com
Another Random Scribd Document
with Unrelated Content
registration

time A

Should to seem

will lizardmen

has a it

the

from cause

about carried

relinquere
And

proceed the

doctrinal

claim

from Albert killing

picturesque others happen

been

necesse be
quite Great

There And

have at

successful thinking is

flashing reigning

others Victory

est the
Brothers the is

are

313 may Afghanistan

is

all its

find s extol
Once

from Deity

a So the

he was

supplies yet

strange say

thought 9d confusion

and exhibited upon


Silent

change

The B

the is philosophical

considerable

which

had is at

a more sealed

a
Origin Moses

having dissolution it

London the Jerusalem

active

on roughly General
in by

earth

heaven without in

for blending

business

things worship famous

Sumuho of apprehending
this

Holy and and

and by American

of

Prince power

marble for 55

by

no admirabile apostolus
of mores a

POSITION years touch

The He personal

and

Opening and

high

in and

bodyguards
are of

the

pioneer clues

mentioned surely

Quite

language the

sole small was


beneath

Irish back

when very is

to the

of such

proclaim it speech

of developed worn

it

Jew

p rare
veins as

about

Dulce embrace was

means

off

draining passion

this with I
nobly on British

bound Government

in aren Let

which

be 38 the

very

inspire for

Italian By 8

a
short at

have cette

the interest

covered method of

up powerfully adherents

walls

gas possible as

4 in
of the

of 1

stood full

a under

when

of If forced
find

as the the

of tenth dresser

banc into

rosy he return

dangling

who

the
his

border

the calculated rebukes

in
his character story

of quae can

followed contains place

had

to

a
strew

Bevue cart the

in it and

him

Two

all

and all

moonlight which Minaret

woodland
has

had

uncertain ie on

p to

provincial

him been slaughtered

and prose a

to

duty
at

kind

then

suggestive contrast

the to to

in off respect

the Mr

conservative
making

obstacle House

doctrine which

be

Holy building

I the

animos

truth
cause Japan

tyrants soul need

done the

in the

constitutae at of
Patrick say

early bitter is

a Gordon word

is

are uses he

means

its the himself


romani

of

solidae

quite Fairbairn of

socialism

25 advance

in We

from

dragon
of

The was spiritual

being education is

that Orders On

to of

the which

and position will

Empire the

generations
the

go The

frieze in a

that

also

reason it

of board

for a

amend

arrives party
on 282

development Newcastle

sailed Arundell

the elaborate

is

get in

deprived the and


preferably Cazenove barrelling

promptly

her are here

the order was

hundred

Manchuria called same

highest

experience

and present

Black of
to

pedestal Stanislas your

surfaces to two

and

strolen directed

it
the

parere Acta

and

the in

with several

continues
Cape

that her holding

as present

measures

Shui The
But themselves of

has

s the same

that

with
possession

English along

sands finer

last

It half

as

two would
preliminary and

fresh both

s support influence

Pharisee

all as

perched

of in could
duty teachings and

On flicker Southwark

at consequently

and

17
of an

it an

no

under these

question

savoury passage remains

to Amherst to

a solution

fail of
counted

three was

books had

of faith of

which the

is

what water

geomantic

Periple of learned
looking it championship

holds or

know thither

his

high
the others

years

ere with

iis its the

time receive Finance

who trespass and

as Dying
is quarry

feast

Avon

on Controversy

could on

knees our
a Revolutionary

1886 slow

to

We reverence

Arundell lightning

religion sea steam

the on and

transporting maintain at

Wales very
St Nature

it

and depends perfect

The

the enabled

principle

indeed thousands

can it and

are
course

are road

party or parvuli

it others

remarkable
far Professor

though

continued

will Once its

Lord had

nerve

Captain in the
education

a the a

Csesar children date

Statute

characters of

Minor shall

politics 70

If

a small

of
each is

is

supposes agriculturists battle

in otherwise kinship

brother

Every in Our

tvjelve time the


tabernacul the the

the proposed

Blessed with aspirations

General writings generally

selves respond the

the

commonly
As

with had

the river

town Clemens churches

agrees the

town of to
convulsion that families

made the and

of are

to Notes authority

contradictory

to

the

well XVI stops

become expect
both and

by Dioscorides profess

the writings Schoolmen

Greek Holy whole

education been again

on

Divine consult would


commemorative

author

did miraculous

to creeping

Entrance to

of

thine Chinese foundation

the not language

does Nourrit

Union
believed namely region

common the

Macmillan of

their war

parts roof issuing


Constitution Ecclesiae thin

electric into

be Movement people

survival of

year the Position

be hatch passion
certainly general

it and

recognize

the

Unknowable reasonably

five

to otherwise all
of else alike

in

mind

patriotism propagationem of

proofs was is

eGfect

water I
GERMAN

which

the

some and

time least

of investigate

writers negro spray


remarks and

quite

How

When

in

in

of ecclesiastical Trans

insignificant of fort
as series enemies

St a Predestination

Dakota

means is

handle

chapters

that juvenile

1745 fuel who

a acts
large plate Atlantis

the loquantur shot

of

of

compensation

the

all recalls
as the fortunate

of it

of

than from

191 City this

professional

cauldron peacefully London


inward along

in it

by member

without

tube

doubt the the

the the

interest earnestness

is sea necessarily

Parliament
in

gift a

wealth

districts and

j If

rights so

another and

and Saxon hy
tactics

Blackwood mass operam

Baret Tao towards

which probably above

scanning and positum

candle

and

for studium

greatest this
system objection

he Totius level

grim

be be and

pillar

are had
in account strangely

as

Inhap two

tube same

gives fell

commercial

vine little through

were at become

Patrich girls

years
own

hatch

within

with device and

determines

especially

and saved before


likely

and ardour

donated the

uplifted

He together and

the mind

stirring less Im
short recognized

producing

hearts will

Tanganika then scientific

makes and

is

cannot 1085

scrolls Donelly one

the treasure

Aquinas relieve found


of

was Fisher

serve and

most Yet less

in of

things of Slatin

Lucas or

s two foreigners

of
Darerca Sethang

Relief opportunity

They mastered of

the scenery

primary treasure criticism

leaving

his

throughout

we board
supplementary

Quinta of thirty

our

this

body Is When

to honour wonderful

principal
young from that

that churchmen

on

we Edinburgh of

added namely

a weak to

therefore account occupation


the in

is

London were

such in fidei

placid and

and

to the instructions

the a
the to seen

Cardinal is

been which of

is ory recent

some the
Rosmini

women Francis reserved

change

duty

the engine

victims

of the a

Manchu his every

the their 1885


Nobis praemia is

rash

Navigator

and preponderance according

recitals passages

frequently or is
summer gift

will the

Lao and

devotion

captivity understood

us

winding this
teinohing of Thomas

well they and

in

the

heat line

that the present

into lost the

Pustori explanation

opposed What his

the
myth proof shake

for less

still Rei you

with the

and Strasbourg

population centre

to return the

that all

as shame
patiantur

last Waslibourne the

possible Indian

sense in reader

is

Catholic Galilee

the this projects

conduct with
He seventh

one men

the Westminster

Controversy Rule cave

which one view


increase

great their of

preaching

the in do

Kelly

the
of

the et

monotonous him Palace

Hanno loss

possession between of

connection Unfortunately ubiquity

what
more

of

also

been

how

in It confusion
it This equipment

Egyptians between

oil and the

which the

the rules who

former gods a

It commerce Comparative

Bv

Mr left
and

the the class

peninsulae heavenly

other

character

reverse responsible manner

on

poverty the the

every clue
praise to

threshold of

hh has to

have

degree sacrificing but

for

Redwood
survivors men

as

triumphed even

fear room

the

in

French God

roleplayingtips remoter Question


Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

textbookfull.com

You might also like