CMDI 1.2
Improvements in the CLARIN Component Metadata Infrastructure
Mitchell Seaton
Center for Language Technology
Department of Nordic Research, UCPH
seaton@hum.ku.dk
10 May 2016
CLARIN Centre Meeting, Utrecht
CMDI
• Schema
 Component-based
modelling, flexible
 Specification language
(CCSL)
• Metadata
 CMDI records (instances)
are based on CMD
profiles
• Standardisation
 CMD model
 ISO 24622-1:2015
• Semantics
 Consists of semantic
annotations (concepts,
CCR)
2
CMDI Framework
3
Concept definitions
Profile 1 Profile 2 Profile 3
Comp
B
Metadata
record
Metadata
record
Metadata
record
instanceOf
contains
links to
Comp A Comp C
CMDI Taskforce
https://trac.clarin.eu/wiki/Taskforces/CMDI
• Resp: Writing specifications, documentation, software
development, and integrations.
• Contact: cmdi@clarin.eu
• Coordinators:
 Twan Goosen (CLARIN ERIC)
 Menzo Windhouwer (Meertens Institute, MPI for
Psycholinguistics)
4
CMDI 1.2 New Features
• Lifecycle Management
• Open/External Vocabularies
• Cues for Tools
• Value Derivation
• Mandatory Attributes
5
CMDI 1.2 Improvements
6
• XML-compliance, cleaner XSD schema, enhanced
validation (schematron assertions)
• CMD Record envelope changes from 1.1
 <IsPartOfList> moved
 Stricter <ResourceRelation>
 Single ResourceProxy reference (@cmd:ref) on
Components
• Namespaces
 Reserved attrs (@cmd:ref, @cmd:componentId)
 Profile-specific payload namespace (cmdp)
• Documentation
 Multilingual (@xml:lang)
 Component and Attribute levels (CCSL)
CMDI 1.2 Model
7
Lifecycle Management
8
• Defined in Header elements for component spec
• <Status> ('development', 'production', 'deprecated')
• <StatusComment> (for deprecated)
• <Successor>
• <DerivedFrom> TextCorpus v1
Status: deprecated
Successor:
DerivedFrom:
TextCorpus v3
Status: development
DerivedFrom: v2
TextCorpus v2
Status: production
Successor: v3
DerivedFrom: v1
Corpus
Status: production
DerivedFrom: v2
derived
Namespaces
9
• Global namespaces
 CMDI instance (general/envelope) namespace
cmd - http://www.clarin.eu/cmd/1
 Cues for tools namespace
cue - http://www.clarin.eu/cmd/cues/1
• Profile-specific payload namespace (cmdp)
cmdp - http://www.clarin.eu/cmd/1/profiles/{profileId}
{profileId}: Identifier of the profile from which the schema is
derived
Open Vocabularies
10
• CLAVAS (OpenSKOS) vocabulary service
 https://openskos.meertens.knaw.nl/clavas/
 Hosted by Meertens (NL)
• Defined in new <Vocabulary> element
• External vocabulary (referenced by @URI attribute)
 May use @cmd:ValueConceptLink attribute in a metadata
record as link to vocabulary entry
• Controlled vocabulary (imported items, sub-set)
• Localisation (@ValueLanguage attribute)
Cues for tools
11
• New XML Namespace (cue):
 http://www.clarin.eu/cmdi/cues/1
• Use cases:
 Enhanced documentation capabilities (localisation)
 Presentation hints (editors/browser tools)
Auto Value Derivation
12
• Optional <AutoValue> CCSL element on CMD
Elements and Attributes
• Derive content for an Element/Attribute from other
values
• Value may give information about a derivation function
• No provided set of derivation functions
• @cmd:AutoValue attribute used in the generated
schema definition
CMDI 1.2 Specification Draft
13
• Formal specification draft
• Review-cycle
 https://trac.clarin.eu/wiki/CMDI%201.2/Specification
Conclusion
14
• Development/review of CMDI 1.2 specification draft
• Implementation of CMDI 1.2
• CMDI 1.2 rollout/release

CMDI 1.2 - Improvements in the CLARIN Component Metadata Infrastructure

  • 1.
    CMDI 1.2 Improvements inthe CLARIN Component Metadata Infrastructure Mitchell Seaton Center for Language Technology Department of Nordic Research, UCPH [email protected] 10 May 2016 CLARIN Centre Meeting, Utrecht
  • 2.
    CMDI • Schema  Component-based modelling,flexible  Specification language (CCSL) • Metadata  CMDI records (instances) are based on CMD profiles • Standardisation  CMD model  ISO 24622-1:2015 • Semantics  Consists of semantic annotations (concepts, CCR) 2
  • 3.
    CMDI Framework 3 Concept definitions Profile1 Profile 2 Profile 3 Comp B Metadata record Metadata record Metadata record instanceOf contains links to Comp A Comp C
  • 4.
    CMDI Taskforce https://trac.clarin.eu/wiki/Taskforces/CMDI • Resp:Writing specifications, documentation, software development, and integrations. • Contact: [email protected] • Coordinators:  Twan Goosen (CLARIN ERIC)  Menzo Windhouwer (Meertens Institute, MPI for Psycholinguistics) 4
  • 5.
    CMDI 1.2 NewFeatures • Lifecycle Management • Open/External Vocabularies • Cues for Tools • Value Derivation • Mandatory Attributes 5
  • 6.
    CMDI 1.2 Improvements 6 •XML-compliance, cleaner XSD schema, enhanced validation (schematron assertions) • CMD Record envelope changes from 1.1  <IsPartOfList> moved  Stricter <ResourceRelation>  Single ResourceProxy reference (@cmd:ref) on Components • Namespaces  Reserved attrs (@cmd:ref, @cmd:componentId)  Profile-specific payload namespace (cmdp) • Documentation  Multilingual (@xml:lang)  Component and Attribute levels (CCSL)
  • 7.
  • 8.
    Lifecycle Management 8 • Definedin Header elements for component spec • <Status> ('development', 'production', 'deprecated') • <StatusComment> (for deprecated) • <Successor> • <DerivedFrom> TextCorpus v1 Status: deprecated Successor: DerivedFrom: TextCorpus v3 Status: development DerivedFrom: v2 TextCorpus v2 Status: production Successor: v3 DerivedFrom: v1 Corpus Status: production DerivedFrom: v2 derived
  • 9.
    Namespaces 9 • Global namespaces CMDI instance (general/envelope) namespace cmd - http://www.clarin.eu/cmd/1  Cues for tools namespace cue - http://www.clarin.eu/cmd/cues/1 • Profile-specific payload namespace (cmdp) cmdp - http://www.clarin.eu/cmd/1/profiles/{profileId} {profileId}: Identifier of the profile from which the schema is derived
  • 10.
    Open Vocabularies 10 • CLAVAS(OpenSKOS) vocabulary service  https://openskos.meertens.knaw.nl/clavas/  Hosted by Meertens (NL) • Defined in new <Vocabulary> element • External vocabulary (referenced by @URI attribute)  May use @cmd:ValueConceptLink attribute in a metadata record as link to vocabulary entry • Controlled vocabulary (imported items, sub-set) • Localisation (@ValueLanguage attribute)
  • 11.
    Cues for tools 11 •New XML Namespace (cue):  http://www.clarin.eu/cmdi/cues/1 • Use cases:  Enhanced documentation capabilities (localisation)  Presentation hints (editors/browser tools)
  • 12.
    Auto Value Derivation 12 •Optional <AutoValue> CCSL element on CMD Elements and Attributes • Derive content for an Element/Attribute from other values • Value may give information about a derivation function • No provided set of derivation functions • @cmd:AutoValue attribute used in the generated schema definition
  • 13.
    CMDI 1.2 SpecificationDraft 13 • Formal specification draft • Review-cycle  https://trac.clarin.eu/wiki/CMDI%201.2/Specification
  • 14.
    Conclusion 14 • Development/review ofCMDI 1.2 specification draft • Implementation of CMDI 1.2 • CMDI 1.2 rollout/release