Linking Database Submissions to Primary Citations with PubMed Central Heather Piwowar and Wendy Chapman Department of Biomedical Informatics University of Pittsburgh BioLINK 2008
 
 
These links are important for several reasons
 
 
 
Sometimes the links are easy to discover
 
 
 
 
But the meaning of hyperlinks is ambiguous:
And often no hyperlinks at all:
One way to identify links: NLP systems that identify  statements of shared data  from within full text.
BUT this requires developing and maintaining a full-text archive!
What about using PubMed Central?
Usage? scientists looking for datasets for reuse curators looking for primary citations researchers studying data sharing behaviour
Goal: Use the simple, full-text query interface of  PubMed Central  to identify articles with shared datasets
Method: Gene expression microarray data GEO database
Method: Open Access articles to train Non-Open access articles to test Gene-expression articles selected by MeSH term query
Gold Standard: True positives (N=550) Articles with primary citation links from GEO + screening of full-text True negatives (N=165) The rest
Building the query: Used full-text of open-access cohort Removed words <40 occurrences Unigram bag-of-words vectors Tree and Rule algorithms,  a variety of parameters
(geo OR omnibus)  AND microarray  AND &quot;gene expression&quot;  AND accession NOT (databases    OR user OR users   OR (public AND accessed)    OR (downloaded AND published))
(geo OR omnibus)  AND microarray  AND &quot;gene expression&quot;  AND accession NOT (databases    OR user OR users   OR (public AND accessed)    OR (downloaded AND published))
(geo OR omnibus)  AND microarray  AND &quot;gene expression&quot;  AND accession NOT (databases    OR user OR users   OR (public AND accessed)    OR (downloaded AND published))
(geo OR omnibus)  AND microarray  AND &quot;gene expression&quot;  AND accession NOT (databases    OR user OR users   OR (public AND accessed)    OR (downloaded AND published))
(geo OR omnibus)  AND microarray  AND &quot;gene expression&quot;  AND accession NOT (databases    OR user OR users   OR (public AND accessed)    OR (downloaded AND published))
Evaluation Results 40% recall 94% precision,  65% for those not yet linked worse than full-NLP results (~ 89%,83%) slightly better than trivial query (34%,90%)
Limitations only one datatype database-centric performance so far is rather mediocre…
Impact? Today’s performance: would increase GEO links by 2.6% by 5.5% annually when all NIH in PMC Double the recall, to 80%: double the numbers above GEO curators added the 40 links identified by this study
We hope this work inspires future enhancements, and highlights the opportunities for simple full-text queries in PubMed Central given the mandated influx of  NIH-funded research reports.
Thank you Advisor:  Dr. Wendy Chapman Funders:  NLM and Pitt DBMI Enablers:  Everyone who deposits their   publications in PubMed Central! My shared data:  www.dbmi.pitt.edu/piwowar Share your research data too!

More Related Content

PDF
Warren-Jones "Using text-mining and summarisation technology to manage the gr...
PPT
Christina engaging the biomedical researchers
PPT
Open Helix
PPTX
Carelli "Promoting Content Discovery Within the Reader/Researcher Workflow"
PPTX
Genome sharing projects around the world nijmegen oct 29 - 2015
PDF
Roth "Tools to support systematic review research"
PPTX
Research in the time of Covid: Surveying impacts on Early Career Researchers
PPTX
Do Open data badges influence author behaviour? A case study at Springer Nature
Warren-Jones "Using text-mining and summarisation technology to manage the gr...
Christina engaging the biomedical researchers
Open Helix
Carelli "Promoting Content Discovery Within the Reader/Researcher Workflow"
Genome sharing projects around the world nijmegen oct 29 - 2015
Roth "Tools to support systematic review research"
Research in the time of Covid: Surveying impacts on Early Career Researchers
Do Open data badges influence author behaviour? A case study at Springer Nature

What's hot (20)

PDF
ELSS use cases and strategy
PDF
Increasing transparency in Medical Education through Open Data
PPTX
UK HE Research Data Management Survey Results - Presentation to EPSRC
PPT
Gene Ontology Enrichment Network Analysis -Tutorial
PPTX
Cassidy "Case Study: Supporting Researcher Impact and Efficiency"
PPTX
Working with Quertle
PPTX
ODP
Cec2010 araujo pereziglesias
PPTX
Payton Eliminating Conflicts in Ebook Metadata
PPSX
Rii stock centerdir_aug9_2016
PPTX
Transparency and reproducibility in research
PPTX
SocialCite makes its debut at the HighWire Press meeting
PDF
Pharos: Putting targets in context
PDF
Pharos – A Torch to Use in Your Journey In the Dark Genome
PPTX
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
PDF
Peter (Yun-shao) Sung's Resume 2016III
PPT
Global Strategy for Plant Conservation Target 1
PPTX
IRIDA: Canada’s federated platform for genomic epidemiology
PDF
Pharos: A Torch to Use in Your Journey in the Dark Genome
PPTX
Cooper "Simplicity is the Ultimate Sophistication: Accessible, Ubiquitous Tec...
ELSS use cases and strategy
Increasing transparency in Medical Education through Open Data
UK HE Research Data Management Survey Results - Presentation to EPSRC
Gene Ontology Enrichment Network Analysis -Tutorial
Cassidy "Case Study: Supporting Researcher Impact and Efficiency"
Working with Quertle
Cec2010 araujo pereziglesias
Payton Eliminating Conflicts in Ebook Metadata
Rii stock centerdir_aug9_2016
Transparency and reproducibility in research
SocialCite makes its debut at the HighWire Press meeting
Pharos: Putting targets in context
Pharos – A Torch to Use in Your Journey In the Dark Genome
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
Peter (Yun-shao) Sung's Resume 2016III
Global Strategy for Plant Conservation Target 1
IRIDA: Canada’s federated platform for genomic epidemiology
Pharos: A Torch to Use in Your Journey in the Dark Genome
Cooper "Simplicity is the Ultimate Sophistication: Accessible, Ubiquitous Tec...
Ad

Viewers also liked (20)

PPT
Finding the right words
PPTX
3 how to_search_discovery_databases
PDF
Msm2011 Twitter Citations
PPTX
4 how to_search_traditional_academic_databases
PDF
Research metrics Apr2013
PPTX
Research Methodology-02: Quality Indices
PPTX
Overview of Citation Metrics
PDF
Peer Reviewed Articles
PPTX
Google Scholar vs. MEDLINE for Health Sciences Literature Searching
PPT
Citing Internet Sources- How to cite an internet site?- ثبت مصادر الانترنت
PPT
Aparato Genital Masculino
PPT
Google Scholar and Web of Science: Similarities and Differences in Citation A...
PDF
Research methods l1 3
PPTX
Abstracting and indexing_Dr. Guenther Eichhorn
PPTX
Darren Shaw_SearchLove San Diego_Audit and fix citations for Local Search gains
PPT
Citations in a Research Paper
PPT
How to Conduct a Literature Review
KEY
Content Analysis
PPTX
Research & Analysis
Finding the right words
3 how to_search_discovery_databases
Msm2011 Twitter Citations
4 how to_search_traditional_academic_databases
Research metrics Apr2013
Research Methodology-02: Quality Indices
Overview of Citation Metrics
Peer Reviewed Articles
Google Scholar vs. MEDLINE for Health Sciences Literature Searching
Citing Internet Sources- How to cite an internet site?- ثبت مصادر الانترنت
Aparato Genital Masculino
Google Scholar and Web of Science: Similarities and Differences in Citation A...
Research methods l1 3
Abstracting and indexing_Dr. Guenther Eichhorn
Darren Shaw_SearchLove San Diego_Audit and fix citations for Local Search gains
Citations in a Research Paper
How to Conduct a Literature Review
Content Analysis
Research & Analysis
Ad

Similar to BIOLINK 2008: Linking database submissions to primary citations with PubMed Central (20)

DOC
Full text
DOC
Full text
PPTX
Presentation2
PPTX
Pub med
PDF
Europe PubMed Central and Linked Data
PDF
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
PPTX
PPTX
Biomed central
PDF
Advancing Translational Research With The Semantic Web
PPTX
finde datasets repository.pptx
PDF
Open Access and Property Rights on a Collision Course with Scholars
PDF
Research into Open Research Data
PPTX
Review of Literature: How to do for a medical research
PPT
Databases in Bioinformatics_Point form_HM_Presentation_BMLS 2021_sESSION TWO.ppt
PPT
Searching Beyond PubMed Central: Free Full-text Articles in The National Libr...
PPTX
Clinical queries version 5
PPTX
Building a Network of Interoperable and Independently Produced Linked and Ope...
PPTX
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
PDF
Lewis isb 7 april2014
PDF
Lewis isb 7 april2014
Full text
Full text
Presentation2
Pub med
Europe PubMed Central and Linked Data
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Biomed central
Advancing Translational Research With The Semantic Web
finde datasets repository.pptx
Open Access and Property Rights on a Collision Course with Scholars
Research into Open Research Data
Review of Literature: How to do for a medical research
Databases in Bioinformatics_Point form_HM_Presentation_BMLS 2021_sESSION TWO.ppt
Searching Beyond PubMed Central: Free Full-text Articles in The National Libr...
Clinical queries version 5
Building a Network of Interoperable and Independently Produced Linked and Ope...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Lewis isb 7 april2014
Lewis isb 7 april2014

More from Heather Piwowar (20)

PDF
Calculating how much your University spends on Open Access--and what to do ab...
PDF
Unsub Lightning Talk
PDF
How to Calculate OA APC Spend for Your University
PDF
Intro to Managing Serials with Net Cost per Paid Use
PDF
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
PDF
The time has come to talk of... who should own scholarly infrastructure?
PDF
What kinds of open have 
made a difference in scholarly communication infrast...
PDF
Data science needs Data and lots of it
PDF
Oadoi and libraries
PDF
Impactstory OA week 2017
PDF
Paperbuzz sneak peek
PDF
Software-Native metrics: Depsy lessons learned
PDF
What's your Impactstory?
PDF
capturing the impact of software AAS 2017
PDF
Software-Native metrics: Depsy lessons learned
PDF
submission summary for #WSSSPE Policy session on Credit, Citation, and Impact
PDF
Building Skyscrapers with our Scholarship
PDF
Right time, right place, to change the world
PDF
No more waiting! Tools that work Today to reveal dataset use
PDF
Analyzing data about our data
Calculating how much your University spends on Open Access--and what to do ab...
Unsub Lightning Talk
How to Calculate OA APC Spend for Your University
Intro to Managing Serials with Net Cost per Paid Use
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
The time has come to talk of... who should own scholarly infrastructure?
What kinds of open have 
made a difference in scholarly communication infrast...
Data science needs Data and lots of it
Oadoi and libraries
Impactstory OA week 2017
Paperbuzz sneak peek
Software-Native metrics: Depsy lessons learned
What's your Impactstory?
capturing the impact of software AAS 2017
Software-Native metrics: Depsy lessons learned
submission summary for #WSSSPE Policy session on Credit, Citation, and Impact
Building Skyscrapers with our Scholarship
Right time, right place, to change the world
No more waiting! Tools that work Today to reveal dataset use
Analyzing data about our data

Recently uploaded (20)

PPTX
Genetics and health: study of genes and their roles in inheritance
PDF
495958952-Techno-Obstetric-sminiOSCE.pdf
PPTX
Journal Article Review - Ankolysing Spondylitis - Dr Manasa.pptx
PPTX
Surgical anatomy, physiology and procedures of esophagus.pptx
PPTX
ACUTE PANCREATITIS combined.pptx.pptx in kids
PDF
Nematodes - by Sanjan PV 20-52.pdf based on all aspects
PPTX
Hypertensive disorders in pregnancy.pptx
PPT
intrduction to nephrologDDDDDDDDDy lec1.ppt
PDF
NCM-107-LEC-REVIEWER.pdf 555555555555555
PDF
neonatology-for-nurses.pdfggghjjkkkkkkjhhg
PPTX
CASE PRESENTATION CLUB FOOT management.pptx
PPTX
Indications for Surgical Delivery...pptx
PPT
ANTI-HYPERTENSIVE PHARMACOLOGY Department.ppt
PPTX
presentation on causes and treatment of glomerular disorders
PPTX
01. cell injury-2018_11_19 -student copy.pptx
PPTX
Computed Tomography: Hardware and Instrumentation
PPTX
Type 2 Diabetes Mellitus (T2DM) Part 3 v2.pptx
PPTX
sexual offense(1).pptx download pptx ...
PPTX
Assessment of fetal wellbeing for nurses.
PPTX
Biostatistics Lecture Notes_Dadason.pptx
Genetics and health: study of genes and their roles in inheritance
495958952-Techno-Obstetric-sminiOSCE.pdf
Journal Article Review - Ankolysing Spondylitis - Dr Manasa.pptx
Surgical anatomy, physiology and procedures of esophagus.pptx
ACUTE PANCREATITIS combined.pptx.pptx in kids
Nematodes - by Sanjan PV 20-52.pdf based on all aspects
Hypertensive disorders in pregnancy.pptx
intrduction to nephrologDDDDDDDDDy lec1.ppt
NCM-107-LEC-REVIEWER.pdf 555555555555555
neonatology-for-nurses.pdfggghjjkkkkkkjhhg
CASE PRESENTATION CLUB FOOT management.pptx
Indications for Surgical Delivery...pptx
ANTI-HYPERTENSIVE PHARMACOLOGY Department.ppt
presentation on causes and treatment of glomerular disorders
01. cell injury-2018_11_19 -student copy.pptx
Computed Tomography: Hardware and Instrumentation
Type 2 Diabetes Mellitus (T2DM) Part 3 v2.pptx
sexual offense(1).pptx download pptx ...
Assessment of fetal wellbeing for nurses.
Biostatistics Lecture Notes_Dadason.pptx

BIOLINK 2008: Linking database submissions to primary citations with PubMed Central

  • 1. Linking Database Submissions to Primary Citations with PubMed Central Heather Piwowar and Wendy Chapman Department of Biomedical Informatics University of Pittsburgh BioLINK 2008
  • 2.  
  • 3.  
  • 4. These links are important for several reasons
  • 5.  
  • 6.  
  • 7.  
  • 8. Sometimes the links are easy to discover
  • 9.  
  • 10.  
  • 11.  
  • 12.  
  • 13. But the meaning of hyperlinks is ambiguous:
  • 14. And often no hyperlinks at all:
  • 15. One way to identify links: NLP systems that identify statements of shared data from within full text.
  • 16. BUT this requires developing and maintaining a full-text archive!
  • 17. What about using PubMed Central?
  • 18. Usage? scientists looking for datasets for reuse curators looking for primary citations researchers studying data sharing behaviour
  • 19. Goal: Use the simple, full-text query interface of PubMed Central to identify articles with shared datasets
  • 20. Method: Gene expression microarray data GEO database
  • 21. Method: Open Access articles to train Non-Open access articles to test Gene-expression articles selected by MeSH term query
  • 22. Gold Standard: True positives (N=550) Articles with primary citation links from GEO + screening of full-text True negatives (N=165) The rest
  • 23. Building the query: Used full-text of open-access cohort Removed words <40 occurrences Unigram bag-of-words vectors Tree and Rule algorithms, a variety of parameters
  • 24. (geo OR omnibus) AND microarray AND &quot;gene expression&quot; AND accession NOT (databases OR user OR users OR (public AND accessed) OR (downloaded AND published))
  • 25. (geo OR omnibus) AND microarray AND &quot;gene expression&quot; AND accession NOT (databases OR user OR users OR (public AND accessed) OR (downloaded AND published))
  • 26. (geo OR omnibus) AND microarray AND &quot;gene expression&quot; AND accession NOT (databases OR user OR users OR (public AND accessed) OR (downloaded AND published))
  • 27. (geo OR omnibus) AND microarray AND &quot;gene expression&quot; AND accession NOT (databases OR user OR users OR (public AND accessed) OR (downloaded AND published))
  • 28. (geo OR omnibus) AND microarray AND &quot;gene expression&quot; AND accession NOT (databases OR user OR users OR (public AND accessed) OR (downloaded AND published))
  • 29. Evaluation Results 40% recall 94% precision, 65% for those not yet linked worse than full-NLP results (~ 89%,83%) slightly better than trivial query (34%,90%)
  • 30. Limitations only one datatype database-centric performance so far is rather mediocre…
  • 31. Impact? Today’s performance: would increase GEO links by 2.6% by 5.5% annually when all NIH in PMC Double the recall, to 80%: double the numbers above GEO curators added the 40 links identified by this study
  • 32. We hope this work inspires future enhancements, and highlights the opportunities for simple full-text queries in PubMed Central given the mandated influx of NIH-funded research reports.
  • 33. Thank you Advisor: Dr. Wendy Chapman Funders: NLM and Pitt DBMI Enablers: Everyone who deposits their publications in PubMed Central! My shared data: www.dbmi.pitt.edu/piwowar Share your research data too!