Resource Description Framework Approach to  Data Publication and Federation Robert Stanley IO Informatics, Inc. IO Informatics ©  2011 June 6 2011 Pistoia Alliance – Open Technical Committee Meeting
Agenda Introduction:  Requirements, Background, Use Cases Technical Example / Use Case:  Requirements for creating a data service and invoking a query Q & A
Recap - ELN Query Functional Requirements The ELN Query Services team have produced a rich set of functional requirements/user stories representing common questions scientists have that can be satisfied with an ELN query. Most, if not all, of those requirements have the conceptual form; SELECT <selection> FROM Experiment WHERE <constraints> …  A common approach to the problem would be preferable , ideally leveraging existing standards, rather than building a solution that works just for Experiment.  …  the Pistoia Technical Committee may wish to constrain such query services, by aligning to existing standards, to ensure consistency in approach.
Example ELN Query Workflow Scientist researching a class of Agents small molecules (or biologics) intended to hit a target or targets links to… Assays test to determine activity, affinity, binding, promiscuity determine potential toxicity, adverse events, etc. links to… Targets sites where compounds bind -- can be locations on a protein, locations on a gene, active centers on an enzyme, etc. links to… Disease/ Gene relationships e.g. biology, can be from TMO / LODD resources pathways, proteins, catalysts, immunology defense mechanisms, potential for adverse events, etc. can be included
Recap - Query Service API As part of the phase two deliverable, the ELN Query Services team produced a prototype SOAP-based service outlining the methods that would be required to support the query service. There are existing protocols and standards for querying structured data over the web.  Aligning to an existing approach will prevent re-inventing a query language  and will provide confidence in the stability of the query interface.  Examples; OData (http://www.odata.org/) is a RESTful query protocol … GData (http://code.google.com/apis/gdata/). Very similar to OData, but published by Google… SPARQL (http://www.w3.org/TR/rdf-sparql-query/) is a RDF query protocol published by W3C for querying linked data. A triple store is [not ] required and there may be synergies with existing Pistoia projects (e.g. VSI, SESL) by adopting SPARQL.
Questions for  Tech Committee Ontology content and format  Regarding reference content How comprehensive and refined does it need to be? How can it relate to records and different detail? Who’s going to maintain the ontology? Is there a practical, agile approach to creating, managing, extending? Standards - driven? (UML, OWL, or ERD, etc.) Investigate service versioning and how backward compatibility might be implemented API Mechanism (OData, Gdata, SPARQL?) Is there a standards-based approach?
Moving Forward?  Resource Description Framework Semantic Technologies provide a unified, standard framework for: Ontology / API representation, modification, merging Query framework / SOA  SPARQL API Rich, extensible Query Federation (Does not require ETL) Agile ontology development & maintenance… Customers can extend the ontologies themselves Broadly – these are standards-driven methods that will make the ELN Query Federation lifecycle much easier
What is RDF? Resource Description Framework (“RDF”) defines and links data for more effective discovery, federation, integration and re-use across applications RDF  is a fundamentally simple, standard and extensible way of specifying data and data relationships Ontologies  describe resources and relationships according to their explicit meaning SPARQL   Protocol and RDF Query Language (“SPARQL”) supports federated queries w’ SPARQL API for publication
RDF graphs are collections of  triples Triples are made up of a  subject , a  predicate , and an  object Resources and relationships (metadata)are stored Resource Description Framework (RDF) is… A labeled, directed graph of relations between resources and literal values. Confidential IO Informatics ©  2011 subject object predicate
“ TP53 encodes Human p53” “ p53 is a tumor suppressor protein” “ TP53 gene is located on the short arm of chromosome 17” Example RDF Triples Confidential TP53 p53 encodes p53 tumor suppressor protein is a TP53 located chromosome  17
Triples Connect to Form Graphs TP53 Confidential p53 encodes tumor suppressor protein is a located chromosome 17 part of Human part of
Why RDF?  What’s Different Here? Triples act as a human and machine readable  least common denominator  for expressing data and relationships Ontologies organize data according to their human readable meaning, to make  defining and merging data  intuitive… RDF supports  inference  and  disambiguation , so merging data and adding  new data and relationships  without shared identifiers becomes possible… Confidential IO Informatics ©  2011
Maturation of a Standard Framework  for Resource Description This standard method and framework for extensible open data publication is maturing: Standards and practices, ontologies, query and data model specification: W3C, ISCB, IEEE, NCBO, OBO SKOS, LOD / LODD and related resources Federation methods:  Ontology Resources, URIs, SPARQL endpoints, ontology alignment, inference, D2R, SWObjects, … Scalability: Security, support for transactional processing Practice: Expertise, training, larger projects (FDA, DOD, NASA, World Cup, Data.gov, Chevron, etc.) IO Informatics ©  2011
Exponentially Growing Resources 2007 2008 2009 2011 IO Informatics ©  2011
Healthcare / Life Sciences remains the largest data sector for SPARQL APIs
Integration / Federation Options Data Sources RDBMS / REST / Web Services Basic Transformation Query-Based Transformation SPARQL 2 SQL (D2R, SWObjects) Federated Semantic Access / Datastore(s) SPARQL  APIs / SOAP / REST  Provenance Versioning Governance Meaning
RDF / SPARQL Solution Stack DBs, Services SPARQL API,  SPARQL conversion, ETL=>RDF Extensible Standards-based Framework Federated Access / Integrated Semantic Datastore(s)  Prediction and Simulation Apps Query by Meaning Rich Browsing Other  Data Sources Public Data, Services Other Apps
Example Use Cases Manufacturing: Link data sources across imprecise connections to verify reports Animal Safety: Knowledge Network for discovery, qualification and validation of cross-species biomarkers Personalized Medicine: Knowledge Network and screening application for personalized medicine IO Informatics ©  2011
Example Questions What data sources support this specific manufacturing report about product purity and shelf-life? What toxicity biomarkers are common to most animals? What patients are showing combinations of indicators that predict risk? IO Informatics ©  2011
Qualitative Benefits Link internal data with public resources E.G. – “out of box” linking of ELN data with LODD sources Growing public resources provide cost-effective enrichment and hypothesis generation Reduce dependencies on expensive commercial databases Emergent Properties ELN data enrichment and knowledge building made easy! Supports inference, rich interrogation, serendipitous discovery R&D concepts can now be translated to immediate consumer benefits Previous “out of reach” integration and applications become practical IO Informatics ©  2011
Quantitative Benefits Reduced effort, time and cost to federate, update, extend to new datasets Start sooner - reduce initial design and deployment burden Ontologies are explicit, can be engineered from or decoupled from resources, can be altered without refactoring Finish sooner - reduced time to create and test integrations Agile modeling, integration and testing Extend more easily - add new data sources and applications  Building blocks to add new data, create new applications End Users can adapt, modify, extend ontologies IO Informatics ©  2011
UBC:  Knowledge Network for Organ Failure;  “ASK” for Personalized Medicine IO Informatics ©  2011
UBC:  Knowledge Network for Organ Failure;  “ASK” for Personalized Medicine Outcome Knowledge network for enrichment, visualization and qualification of patterns indicating risk of organ failure Web-based deployment of SPARQL-based screening patterns across multiple data sources, indicating patients-at-risk IO Informatics ©  2011
Screening of transplant patients for likelihood of transplant failure, based on combined biomarker patterns Personalized Medicine  Knowledge Network IO Informatics ©  2011 Web-based Knowledge Application Applies patterns for predictive screening Weighing, scoring of results Bring “hits” back into Knowledge Network for validation of hypotheses and algorithms
ASK™ Extended SPARQL-based Knowledge Application Visualization, Testing, Validation of Systems-oriented Hypotheses  Automation, Dashboards Confidential IO Informatics ©  2011
UBC / PROOF: Quantitative… Integration and analysis time reduced from estimated 2 years to about 8 months FTE equivalent Time to capture and apply patterns reduced from days to hours Knowledge base can be / has been extended to include new public sources in hours (days / weeks with curation) Expensive commercial database no longer needed due to ease of integrating public resources IO Informatics ©  2011
UBC / PROOF: Qualitative… Visual SPARQL presents queries as hypotheses Make it possible for researchers to  iteratively create, test and refine hypotheses Research queries were published to web service for easy scale-up Extended SPARQL delivers practically useful classifiers with scoring Use of RDF makes enrichment with public sources practical Provenance, reference annotation, original data accessible for review “ [The] ability to consume and intuitively represent a wide variety of data-types - from images to quantitative data - and more importantly, display that data in ways that make the significant features immediately obvious to our biologist end-users, has allowed us to move to a completely new level of data analysis….” IO Informatics ©  2011
Recap –  Core Benefit Drivers Semantic technologies provide a standard method and framework for data publication and interoperability… not a new “data standard”! Low barrier to entry for data publishing – extensible building blocks  for agile, growing integrations Reduced effort, time and cost to deliver and maintain data  definitions and applications, particularly those that depend on  federation Growing public resources are a catalyst and value-add Projects that were impractical become practical to achieve and  maintain IO Informatics ©  2011
Thank you! Next, W3C… For questions: Email:  [email_address] Website:  www.io-informatics.com IO Informatics ©  2011

More Related Content

PPTX
Finding common ground: integrating the eagle-i and VIVO ontologies
PDF
II-SDV 2012 Patent Prior-Art Searching with Latent Semantic Analysis
PDF
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
PDF
Allotrope foundation vanderwall_and_little_bio_it_world_2016
PPTX
Pistoia Alliance European Conference 2015 - Gerhard Noelken / Allotrope Found...
PDF
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
PDF
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
PPTX
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
Finding common ground: integrating the eagle-i and VIVO ontologies
II-SDV 2012 Patent Prior-Art Searching with Latent Semantic Analysis
II-SDV 2012 Automatic Query Re-Ranking in a Patent Database by Local Frequenc...
Allotrope foundation vanderwall_and_little_bio_it_world_2016
Pistoia Alliance European Conference 2015 - Gerhard Noelken / Allotrope Found...
Allotrope Foundation & OSTHUS at SmartLab Exchange 2015: Update on the Allotr...
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...

What's hot (20)

PDF
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
PPT
Dia09
PDF
II-SDV 2012 Expert System Driven Insights into Patent Quality and Competitive...
PDF
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
PPT
How to Develope a Portfolio
PDF
C a s e - b a s e d S y s t e m f o r I n n o v a t i o n M a n a g e m e n t...
PDF
Session 0.0 poster minutes madness
PDF
Knowledge graphs ilaria maresi the hyve 23apr2020
PDF
Dt35682686
PPT
Knowledge Discovery in an Agents Environment
PPT
How Semantic Technology Helps Researchers
PPTX
AI-SDV 2020: Kairntech
PDF
Introduction of BJU-BMR-RG and use case study of Applying openEHR archetypes ...
PPTX
2020.04.07 automated molecular design and the bradshaw platform webinar
PDF
II-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text Mining
PPT
2008-05-05 GEOSS UIC-ADC AQ Scen W shop Toronto
PPT
sers, Applications and the Community of Practice for the Air Quality Scenario
PDF
IC-SDV 2019: OntoChem
PDF
From allotrope to reference master data management
PDF
Challenges & Opportunities of Implementation FAIR in Life Sciences
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
Dia09
II-SDV 2012 Expert System Driven Insights into Patent Quality and Competitive...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
How to Develope a Portfolio
C a s e - b a s e d S y s t e m f o r I n n o v a t i o n M a n a g e m e n t...
Session 0.0 poster minutes madness
Knowledge graphs ilaria maresi the hyve 23apr2020
Dt35682686
Knowledge Discovery in an Agents Environment
How Semantic Technology Helps Researchers
AI-SDV 2020: Kairntech
Introduction of BJU-BMR-RG and use case study of Applying openEHR archetypes ...
2020.04.07 automated molecular design and the bradshaw platform webinar
II-SDV 2012 Making Knowledge Discoverable: The Role of Agile Text Mining
2008-05-05 GEOSS UIC-ADC AQ Scen W shop Toronto
sers, Applications and the Community of Practice for the Air Quality Scenario
IC-SDV 2019: OntoChem
From allotrope to reference master data management
Challenges & Opportunities of Implementation FAIR in Life Sciences
Ad

Viewers also liked (7)

PPTX
The Pistoia Alliance Information Ecosystem Workshop
PPTX
The Standards Landscape
PPTX
Information Ecosystem Business Models
PPT
Canonical Models for Large Molecules
PPTX
The Pistoia Alliance Biology Domain Strategy April 2011
PPTX
The Pistoia Alliance: Strategy, Progress, Momentum
PPTX
CRISPR: what it is, and why it is having a profound impact on human health
The Pistoia Alliance Information Ecosystem Workshop
The Standards Landscape
Information Ecosystem Business Models
Canonical Models for Large Molecules
The Pistoia Alliance Biology Domain Strategy April 2011
The Pistoia Alliance: Strategy, Progress, Momentum
CRISPR: what it is, and why it is having a profound impact on human health
Ad

Similar to Resource Description Framework Approach to Data Publication and Federation (20)

PPT
Semantic Web: Technolgies and Applications for Real-World
PPTX
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
PDF
Ontology Based Approach for Semantic Information Retrieval System
PPTX
Linked Open Data_mlanet13
PPT
Semantics in Financial Services -David Newman
PPT
eHealth - Mark Yendt
PDF
The Nature of Information
PPT
Faceted Navigation (LACASIS Fall Workshop 2005)
PDF
AHM 2014: OceanLink, Smart Data versus Smart Applications
PDF
Ontologies and semantic web
PPTX
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
PPT
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
PDF
Linked Data: Opportunities for Entrepreneurs
PPTX
Pistoia Chemistry Live Strategy April 2011
PDF
Ontotext Overview Winter 2012
PPTX
The Research Object Initiative: Frameworks and Use Cases
PPTX
FAIR Data and Model Management for Systems Biology (and SOPs too!)
PPTX
FAIR data and model management for systems biology (and SOPs too!)
DOCX
Towards Ontology Development Based on Relational Database
PPTX
Curation-Friendly Tools for the Scientific Researcher
Semantic Web: Technolgies and Applications for Real-World
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
Ontology Based Approach for Semantic Information Retrieval System
Linked Open Data_mlanet13
Semantics in Financial Services -David Newman
eHealth - Mark Yendt
The Nature of Information
Faceted Navigation (LACASIS Fall Workshop 2005)
AHM 2014: OceanLink, Smart Data versus Smart Applications
Ontologies and semantic web
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Linked Data: Opportunities for Entrepreneurs
Pistoia Chemistry Live Strategy April 2011
Ontotext Overview Winter 2012
The Research Object Initiative: Frameworks and Use Cases
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR data and model management for systems biology (and SOPs too!)
Towards Ontology Development Based on Relational Database
Curation-Friendly Tools for the Scientific Researcher

More from Pistoia Alliance (20)

PDF
Fairification experience clarifying the semantics of data matrices
PPTX
MPS webinar master deck
PPTX
Digital webinar master deck final
PDF
Heartificial intelligence - claudio-mirti
PDF
Fair by design
PDF
Data market evolution, a future shaped by FAIR
PPTX
AI in translational medicine webinar
PDF
CEDAR work bench for metadata management
PDF
Open interoperability standards, tools and services at EMBL-EBI
PDF
Fair webinar, Ted slater: progress towards commercial fair data products and ...
PPTX
Implementing Blockchain applications in healthcare
PPTX
Building trust and accountability - the role User Experience design can play ...
PPTX
Pistoia Alliance-Elsevier Datathon
PDF
Data for AI models, the past, the present, the future
PDF
PA webinar on benefits & costs of FAIR implementation in life sciences
PDF
AI & ML in Drug Design: Pistoia Alliance CoE
PDF
Ai in drug design webinar 26 feb 2019
PDF
Blockchain and IOT and the GxP Lab Slides
PDF
Knowledge Graphs for Pharma PA Slideshow
PDF
Data quality supporting AI in Life Sciences webinar 10 dec 2018
Fairification experience clarifying the semantics of data matrices
MPS webinar master deck
Digital webinar master deck final
Heartificial intelligence - claudio-mirti
Fair by design
Data market evolution, a future shaped by FAIR
AI in translational medicine webinar
CEDAR work bench for metadata management
Open interoperability standards, tools and services at EMBL-EBI
Fair webinar, Ted slater: progress towards commercial fair data products and ...
Implementing Blockchain applications in healthcare
Building trust and accountability - the role User Experience design can play ...
Pistoia Alliance-Elsevier Datathon
Data for AI models, the past, the present, the future
PA webinar on benefits & costs of FAIR implementation in life sciences
AI & ML in Drug Design: Pistoia Alliance CoE
Ai in drug design webinar 26 feb 2019
Blockchain and IOT and the GxP Lab Slides
Knowledge Graphs for Pharma PA Slideshow
Data quality supporting AI in Life Sciences webinar 10 dec 2018

Recently uploaded (20)

PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
PPTX
Microsoft User Copilot Training Slide Deck
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
PDF
SaaS reusability assessment using machine learning techniques
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PPTX
Configure Apache Mutual Authentication
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
Microsoft User Copilot Training Slide Deck
Custom Battery Pack Design Considerations for Performance and Safety
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
Enhancing plagiarism detection using data pre-processing and machine learning...
Early detection and classification of bone marrow changes in lumbar vertebrae...
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
SGT Report The Beast Plan and Cyberphysical Systems of Control
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
SaaS reusability assessment using machine learning techniques
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
Configure Apache Mutual Authentication

Resource Description Framework Approach to Data Publication and Federation

  • 1. Resource Description Framework Approach to Data Publication and Federation Robert Stanley IO Informatics, Inc. IO Informatics © 2011 June 6 2011 Pistoia Alliance – Open Technical Committee Meeting
  • 2. Agenda Introduction: Requirements, Background, Use Cases Technical Example / Use Case: Requirements for creating a data service and invoking a query Q & A
  • 3. Recap - ELN Query Functional Requirements The ELN Query Services team have produced a rich set of functional requirements/user stories representing common questions scientists have that can be satisfied with an ELN query. Most, if not all, of those requirements have the conceptual form; SELECT <selection> FROM Experiment WHERE <constraints> … A common approach to the problem would be preferable , ideally leveraging existing standards, rather than building a solution that works just for Experiment. … the Pistoia Technical Committee may wish to constrain such query services, by aligning to existing standards, to ensure consistency in approach.
  • 4. Example ELN Query Workflow Scientist researching a class of Agents small molecules (or biologics) intended to hit a target or targets links to… Assays test to determine activity, affinity, binding, promiscuity determine potential toxicity, adverse events, etc. links to… Targets sites where compounds bind -- can be locations on a protein, locations on a gene, active centers on an enzyme, etc. links to… Disease/ Gene relationships e.g. biology, can be from TMO / LODD resources pathways, proteins, catalysts, immunology defense mechanisms, potential for adverse events, etc. can be included
  • 5. Recap - Query Service API As part of the phase two deliverable, the ELN Query Services team produced a prototype SOAP-based service outlining the methods that would be required to support the query service. There are existing protocols and standards for querying structured data over the web. Aligning to an existing approach will prevent re-inventing a query language and will provide confidence in the stability of the query interface. Examples; OData (http://www.odata.org/) is a RESTful query protocol … GData (http://code.google.com/apis/gdata/). Very similar to OData, but published by Google… SPARQL (http://www.w3.org/TR/rdf-sparql-query/) is a RDF query protocol published by W3C for querying linked data. A triple store is [not ] required and there may be synergies with existing Pistoia projects (e.g. VSI, SESL) by adopting SPARQL.
  • 6. Questions for Tech Committee Ontology content and format Regarding reference content How comprehensive and refined does it need to be? How can it relate to records and different detail? Who’s going to maintain the ontology? Is there a practical, agile approach to creating, managing, extending? Standards - driven? (UML, OWL, or ERD, etc.) Investigate service versioning and how backward compatibility might be implemented API Mechanism (OData, Gdata, SPARQL?) Is there a standards-based approach?
  • 7. Moving Forward? Resource Description Framework Semantic Technologies provide a unified, standard framework for: Ontology / API representation, modification, merging Query framework / SOA SPARQL API Rich, extensible Query Federation (Does not require ETL) Agile ontology development & maintenance… Customers can extend the ontologies themselves Broadly – these are standards-driven methods that will make the ELN Query Federation lifecycle much easier
  • 8. What is RDF? Resource Description Framework (“RDF”) defines and links data for more effective discovery, federation, integration and re-use across applications RDF is a fundamentally simple, standard and extensible way of specifying data and data relationships Ontologies describe resources and relationships according to their explicit meaning SPARQL Protocol and RDF Query Language (“SPARQL”) supports federated queries w’ SPARQL API for publication
  • 9. RDF graphs are collections of triples Triples are made up of a subject , a predicate , and an object Resources and relationships (metadata)are stored Resource Description Framework (RDF) is… A labeled, directed graph of relations between resources and literal values. Confidential IO Informatics © 2011 subject object predicate
  • 10. “ TP53 encodes Human p53” “ p53 is a tumor suppressor protein” “ TP53 gene is located on the short arm of chromosome 17” Example RDF Triples Confidential TP53 p53 encodes p53 tumor suppressor protein is a TP53 located chromosome 17
  • 11. Triples Connect to Form Graphs TP53 Confidential p53 encodes tumor suppressor protein is a located chromosome 17 part of Human part of
  • 12. Why RDF? What’s Different Here? Triples act as a human and machine readable least common denominator for expressing data and relationships Ontologies organize data according to their human readable meaning, to make defining and merging data intuitive… RDF supports inference and disambiguation , so merging data and adding new data and relationships without shared identifiers becomes possible… Confidential IO Informatics © 2011
  • 13. Maturation of a Standard Framework for Resource Description This standard method and framework for extensible open data publication is maturing: Standards and practices, ontologies, query and data model specification: W3C, ISCB, IEEE, NCBO, OBO SKOS, LOD / LODD and related resources Federation methods: Ontology Resources, URIs, SPARQL endpoints, ontology alignment, inference, D2R, SWObjects, … Scalability: Security, support for transactional processing Practice: Expertise, training, larger projects (FDA, DOD, NASA, World Cup, Data.gov, Chevron, etc.) IO Informatics © 2011
  • 14. Exponentially Growing Resources 2007 2008 2009 2011 IO Informatics © 2011
  • 15. Healthcare / Life Sciences remains the largest data sector for SPARQL APIs
  • 16. Integration / Federation Options Data Sources RDBMS / REST / Web Services Basic Transformation Query-Based Transformation SPARQL 2 SQL (D2R, SWObjects) Federated Semantic Access / Datastore(s) SPARQL APIs / SOAP / REST Provenance Versioning Governance Meaning
  • 17. RDF / SPARQL Solution Stack DBs, Services SPARQL API, SPARQL conversion, ETL=>RDF Extensible Standards-based Framework Federated Access / Integrated Semantic Datastore(s) Prediction and Simulation Apps Query by Meaning Rich Browsing Other Data Sources Public Data, Services Other Apps
  • 18. Example Use Cases Manufacturing: Link data sources across imprecise connections to verify reports Animal Safety: Knowledge Network for discovery, qualification and validation of cross-species biomarkers Personalized Medicine: Knowledge Network and screening application for personalized medicine IO Informatics © 2011
  • 19. Example Questions What data sources support this specific manufacturing report about product purity and shelf-life? What toxicity biomarkers are common to most animals? What patients are showing combinations of indicators that predict risk? IO Informatics © 2011
  • 20. Qualitative Benefits Link internal data with public resources E.G. – “out of box” linking of ELN data with LODD sources Growing public resources provide cost-effective enrichment and hypothesis generation Reduce dependencies on expensive commercial databases Emergent Properties ELN data enrichment and knowledge building made easy! Supports inference, rich interrogation, serendipitous discovery R&D concepts can now be translated to immediate consumer benefits Previous “out of reach” integration and applications become practical IO Informatics © 2011
  • 21. Quantitative Benefits Reduced effort, time and cost to federate, update, extend to new datasets Start sooner - reduce initial design and deployment burden Ontologies are explicit, can be engineered from or decoupled from resources, can be altered without refactoring Finish sooner - reduced time to create and test integrations Agile modeling, integration and testing Extend more easily - add new data sources and applications Building blocks to add new data, create new applications End Users can adapt, modify, extend ontologies IO Informatics © 2011
  • 22. UBC: Knowledge Network for Organ Failure; “ASK” for Personalized Medicine IO Informatics © 2011
  • 23. UBC: Knowledge Network for Organ Failure; “ASK” for Personalized Medicine Outcome Knowledge network for enrichment, visualization and qualification of patterns indicating risk of organ failure Web-based deployment of SPARQL-based screening patterns across multiple data sources, indicating patients-at-risk IO Informatics © 2011
  • 24. Screening of transplant patients for likelihood of transplant failure, based on combined biomarker patterns Personalized Medicine Knowledge Network IO Informatics © 2011 Web-based Knowledge Application Applies patterns for predictive screening Weighing, scoring of results Bring “hits” back into Knowledge Network for validation of hypotheses and algorithms
  • 25. ASK™ Extended SPARQL-based Knowledge Application Visualization, Testing, Validation of Systems-oriented Hypotheses Automation, Dashboards Confidential IO Informatics © 2011
  • 26. UBC / PROOF: Quantitative… Integration and analysis time reduced from estimated 2 years to about 8 months FTE equivalent Time to capture and apply patterns reduced from days to hours Knowledge base can be / has been extended to include new public sources in hours (days / weeks with curation) Expensive commercial database no longer needed due to ease of integrating public resources IO Informatics © 2011
  • 27. UBC / PROOF: Qualitative… Visual SPARQL presents queries as hypotheses Make it possible for researchers to iteratively create, test and refine hypotheses Research queries were published to web service for easy scale-up Extended SPARQL delivers practically useful classifiers with scoring Use of RDF makes enrichment with public sources practical Provenance, reference annotation, original data accessible for review “ [The] ability to consume and intuitively represent a wide variety of data-types - from images to quantitative data - and more importantly, display that data in ways that make the significant features immediately obvious to our biologist end-users, has allowed us to move to a completely new level of data analysis….” IO Informatics © 2011
  • 28. Recap – Core Benefit Drivers Semantic technologies provide a standard method and framework for data publication and interoperability… not a new “data standard”! Low barrier to entry for data publishing – extensible building blocks for agile, growing integrations Reduced effort, time and cost to deliver and maintain data definitions and applications, particularly those that depend on federation Growing public resources are a catalyst and value-add Projects that were impractical become practical to achieve and maintain IO Informatics © 2011
  • 29. Thank you! Next, W3C… For questions: Email: [email_address] Website: www.io-informatics.com IO Informatics © 2011

Editor's Notes

  • #3: RDF is very simple. It provides a framework for describing things according to their relationships. A key value proposition for semantic technologies is to make it easy to describe and connect related data sources, in order to search them as deeply as we care to. Although there is much useful effort in the formal terminology space, the broad focus is on practical outcomes – specifically, to provide a standard set of methods and framework for describing and linking data. This is NOT a standard for what the minimum or perfect data definition should be! – it is a standard way of defining and redefining data…
  • #4: Lowering barriers to interoperability using standards does NOT mean using standardized, approved terms in an ontology or API. It means using a common framework for resource description, in which terms can easily be surfaced, visualized, tested, merged – due to the common approach.
  • #6: SPARQL is an open standard Query Service API. It supports a query language with a variety of visual query interfaces and takes advantage of growing linked open data resources . SPARQL APIs can utilize SOAP and REST.
  • #7: Agile resource description is a key benefit of RDF and SPARQL. Although it can support highly formal resource description, RDF and SPARQL lowers barrier to interoperability caused by excessive concern about perfect specification of reference content. RDF opens up this conversation providing standard framework and practices for mixing, mapping and updating resource descriptions via data model providers AND consumers. RDF also can manage provenance and links to original data, full experiment records, etc. --------- SPARQL endpoints provide visible data models that can be visually managed by data providers. They can also be rapidly adapted by local users mapping to more or less well-formed APIs. Depending on user need, these can be narrowly practical application ontologies or formal OWL models. Application ontologies can be rapidly mapped to OWL ontologies and vice versa.
  • #8: Broadly, semantic technologies provide a standard framework and methods for describing and querying data resources and connections between data elements. The technology space has been slowing growing and maturing over the past decade. People were writing papers and doing work in the area in the late ‘90s. Tim Berners-Lee wrote an article in 2001 that was published in Scientific American and began popularizing the concepts. These methods make it possible to visualize and connect data in a way that makes sense both to humans and computers. Using RDF and SPARQL, data descriptions can be modified by end users without refactoring the database or databases.
  • #9: RDF is very simple. It provides a framework for describing things according to their relationships. A key value proposition for semantic technologies is to make it easy to describe and connect related data sources, in order to search them as deeply as we care to. Although there is much useful effort in the formal terminology space, the broad focus is on practical outcomes – specifically, to provide a standard set of methods and framework for describing and linking data. This is NOT a standard for what the minimum or perfect data definition should be! – it is a standard way of defining and redefining data…
  • #12: Clockwise from 6 oclock &gt; Species, (protein description), Protein, Gene, Gene Location Terminology around new methods can cause confusion. For ease, I find it useful at times to refer to “assertions” instead of “triples”, and to “data structure” instead “ontologies”.
  • #14: These methods have been developed by W3C, MIT, Stanford, Dublin Core, DERI and elsewhere for just about a decade. Vendor and consumer adoption and technical maturity has grown over the past 8 years or so, with exponential growth over the past 4 years. This compares very favorably to “proprietary” methods and standards invented by single vendors.
  • #22: Agile method; human readable ontologies, no more over-specification, easy to manage and update, end users can extend the API
  • #24: First we linked all of the data of interest to help the customer understand what happens biologically in organ transplant rejection scenarios. Next we apply SPARQL to run searches across clinical, protein and gene expression databases, to screen for patients at risk.
  • #29: Not a new data standard, a standard way of publishing and linking data Extensible so you can start and get to a ‘good enough’ state quickly Created with federation in mind Out-of-box benefits for project completion, maintenance and extension to new data sources