Possibilities For Structuring Data in The Wikidata Format For Data Stored in Blockchain
Possibilities For Structuring Data in The Wikidata Format For Data Stored in Blockchain
com
Copyright © Marcelo Ribeiro de Oliveira Mello et al. This is an open access article distributed under the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Citation: Marcelo Ribeiro de Oliveira Mello and Patrick Letouzé Moreira. “Possibilities for structuring data in the wikidata format for data stored in
blockchain”, International Journal of Development Research, 12, (03), 54867-54871.
(Blockchain) for provision of metadata in a format compatible with Definition of inclusion and exclusion criteria: To determine the
Wikidata. To achieve this goal, i carried out a scope review of the validity of the selected studies, the following inclusion and exclusion
scientific literature that addresses the use of block chains to store criteria were used:
structured semantic data in the Wikidata format.
Inclusion Criteria (CI)
SCOPE REVIEW AND METHODOLOGY CI1 - The record must be related to information technology.
CI2 - The record must be related to data storage in the Wikidata
The present work approached a methodology like a systematic format in a network based on blockchain technology.
review, known as Scope Review. Such review is important as a CI3 - The record must be in English or Portuguese.
beacon that will guide deeper future work and research. According to CI4 - The record should come since 2017 (about five years after the
Petticrew et all, (2018), such methodology involves a research and launch of the Wikidata project) until 2021.
evaluation of the existing literature to determine which categories of
studies can be considered as a way of refining the research question. Exclusion Criteria (EC)
This opens the way for the possibility of carrying out a future
systematic review. This research involves knowing where such CE1 - Records prior to 2017.
records are published and in which scientific databases they are CE2 - Repeated records.
located. According to the same author, this research typically receives CE3 - Records unavailable or inaccessible electronically.
restriction of studies by language and publication date of the CE4 - Records that are not scientific articles or theses.
researched records. On the other hand, it is important to highlight that CE5 - Records in a language other than English and Portuguese.
a typical systematic review work is guided by some steps for its CE6 - Proceedings of congresses, conferences, and the like.
execution, according to the publication of the Cochrane Handbook
(Higgins et al, 2022). These steps are alike the following: a) Search and exclusions based on criteria and scope analysis: By
Formulation of the Research Question; b) Location of studies; c) carrying out the search for studies in the databases and by the
Critical evaluation of studies, d) Data collection; e) Groupings and keywords, I obtained the results of 156 preliminary studies. These
presentation of data; f) Interpretation of data. were then filtered by the exclusion criteria, which resulted in 69
remaining studies.
Table1. Scope Analysis Flow: Search Results by Keywords with application of filters based on exclusion and inclusion criteria
The steps of this scope review adopted in this project were inferred Then, these were organized in an intermediate table containing their
from the explanations of both authors mentioned above. These are: respective titles and abstracts, for which, one by one, they were
analyzed based on the inclusion criteria. Finally, 10 scopes of studies
Choice of relevant scientific literature databases. met the inclusion criteria and were included. The table abow
Definition of search keywords. illustrates the dynamics of the scope analysis flow performed as well
Definition of inclusion and exclusion criteria. as the quantitative studies filtered in each step.
Search and exclusions based on criteria and scope analysis.
In-depth analysis of the studies included and tabulation of
what was learned. RESULTS
Choice of relevant scientific literature databases: The selection of
In this section, the results of the analysis of the scope of the 10 studies
studies was performed by consulting the following databases:
that were included are shown. It is noteworthy that of these 10, none
ACM Digital Library (http://portal.acm.org); was excluded after in-depth analysis, that is, all were considered to
Emerald (http://www.emeraldinsight.com/search.htm); stay in the results table.
Google Scholar (http://scholar.google.com.br);
IEEE Xplore (http://ieeexplore.ieee.org/Xplore/home.jsp); Studies Included based on the criteria: 10 of the 156 studies found in
ScienceDirect Elsevier (http://www.elsevier.com). the searches performed were included, after analysis of titles and
abstracts, with application of inclusion and exclusion criteria. The
The following keywords were used to perform the search for studies. included records after applying these criteria are listed below.
Initially, such words were selected and modified in such a way as to
restrict the resulting number of scientific records without correlation 1) Domingue, J., Third, A., & Ramachandran, M. (2019, May). The
with those of resulting studies that had no connection with the FAIR TRADE framework for assessing decentralised data solutions.
objectives of this work.
Abstract: Decentralized data solutions bring their own sets of
Definition of search keywords capabilities, requirements and issues not necessarily present in
centralized solutions. In order to compare the properties of different
"blockchain" AND "Wikidata"; approaches or tools for management of decentralized data, it is
"smart contracts" AND "Wikidata". important to have a common evaluation framework. We present a set
54869 International Journal of Development Research, Vol. 12, Issue, 03, pp. 54867-54871, March, 2022
of dimensions relevant to data management in decentralized contexts Rojas, Remy. "RDF management založený na technologii
and use them to define principles extending the FAIR framework, Blockchain." (2019).
initially developed for open research data. By characterizing a range
of different data solutions or approaches by how Trusted, Abstract: As Structured open data sees a growth in popularity
Autonomous, Distributed and Decentralized, in addition to how evidenced by the size of networks such as the Linked Open Data
Findable, Accessible, Interoperable and Reusable, they are, we show LOD cloud, aspects of its lifecycle management and scalability have
that our FAIR TRADE framework is useful for describing and yet to be addressed. At the time of writing, implementations of
evaluating the management of decentralized data solutions, and aim change tracking and provenance do not guarantee integrity and
to contribute to the development of best practice in a developing field. availability and depend upon individual domain owners to be
deployed and maintained. This represents a threat to the stability of a
2) Beris, T., Angelidis, I., Chalkidis, I., Nikolaou, C., Papaloukas, C., system in which data is composed of cross-domain URI references
Soursos, P., & Koubarakis, M. (2019, May). Towards a decentralized, such as the Semantic Web's de-facto model: RDF. In this paper we
trusted, intelligent and linked public sector: A report from the Greek explore the advantages and capabilities a solution based on
trenches Blockchain can provide when used as a support for RDF. We provide
the design, implementation, testing, and evaluation of a Proof of
Abstract: This paper is a progress report on our recent work on two Concept Distributed Ledger which addresses the use-cases of Create,
applications that use Linked Data and Distributed Ledger Read, Update, Delete (CRUD) operations, Linked Data Notifications,
technologies and aim to transform the Greek public sector into a and Publish/Subscribe Observer pattern. Our solution provides
decentralized, trusted, intelligent and linked organization. The first mutually distrusting parties a support for traceability and provenance
application is a re-engineering of Diavgeia, the Greek government of versioned RDF statements, leveraging integrity and availability
portal for open and transparent public administration. The second with decentralization.
application is Nomothesia, a new portal that we have built, which
makes Greek legislation available on the Web as linked data to enable Kirstein, F. (2019). A Decentralized Provenance Network for Linked
its effective use by citizens, legal professionals and software Open Data.
developers who would like to build new applications that utilize
Greek legislation. The presented applications have been implemented Abstract: With the growing availability of Linked Open Data (LOD)
without funding from any source and are available for free to any part and the consequential generation of derived and aggregated data, the
of the Greek public sector that may want to use them. An important need for trustworthy, reproducible and accessible provenance
goal of this paper is to present the lessons learned from this effort. information has increased. Yet, no consistent mechanism has been
established to manage provenance data of LOD on a global dataset-
3) Leal, F., Veloso, B., Malheiro, B., González-Vélez, H., & level. Decentralized networks and peer-to-peer mechanisms have
Burguillo, J. C. (2020). A 2020 perspective on “scalable modelling made their revival in the last years with blockchain and similar
and recommendation using wiki-based crowdsourced repositories:” distributed ledger technologies. We propose a novel approach to track
fairness, scalability, and real-time recommendation. and store provenance information for LOD on a dataset-level by
sharing an immutable, common state between data providers. The
Abstract: Wiki-based crowdsourced data sources generally lack basic architecture will not disrupt existing methodologies and
reliability, as their provenance is not intrinsically marshalled. By standards for publishing LOD, but will be transparently integrated
using recommendation, one may arguably assess the reliability of into existing ecosystems as an additional layer to foster broad
wiki-based repositories in order to identify the most interesting acceptance. We will investigate the application of emerging
articles for a given domain. In this commentary, we explore current blockchain technologies and established Linked Data specifications
trends in scalable modelling and recommendation methods based on for building this decentralized anchor of truth. We are actively
side information such as the quality and popularity of wiki articles. involved in the design and implementation of LOD and Open Data
The systematic parallelization of such profiling and recommendation platforms and will evaluate our approach in real-world scenarios
algorithms allows the concurrent processing of distributed regarding feasibility, governance, scalability and usability.
crowdsourced Wikidata repositories. These algorithms, which
perform incremental updating, need further research to improve the Brutzman, D. P., Blais, C. L., & Wu, H. F. (2020). Ethical Control of
performance and generate up-to-date high-quality recommendations. Unmanned Systems: Lifesaving/Lethal Scenarios for Naval
This article builds upon our previous work (Leal et al., 2019) by Operations.
extending the literature review and identifying important trends and
challenges pertaining to crowdsourcing platforms, particularly those Abstract: This research in Ethical Control of Unmanned Systems
of Wikidata provenance. applies precepts of Network Optional Warfare NOW to develop a
three-step Mission Execution Ontology MEO methodology for
4) Shrestha, A. K., & Vassileva, J. (2018, June). Blockchain-based validating, simulating, and implementing mission orders for
research data sharing framework for incentivizing the data owners. unmanned systems. First, mission orders are represented in ontologies
that are understandable by humans and readable by machines. Next,
Abstract: Data sharing practices are much needed to maximize the MEO is validated and tested for logical coherence using Semantic
knowledge gain by researchers. However, when and what data should Web standards. The validated MEO is refined for implementation in
be shared with whom, and how credit should be awarded to the data simulation and visualization. This process is iterated until the MEO is
owner needs to be clearly addressed to create an individual incentive ready for implementation. This methodology is applied to four Naval
for data owners to share their data. A platform that allows owners to scenarios in order of increasing challenges that the operational
control and get rewards from sharing their data would be an important environment and the adversary impose on the Human-Machine Team.
enabler of research data-sharing, since presently, such incentives for The extent of challenge to Ethical Control in the scenarios is used to
researchers to share their data are largely missing. Our approach refine the MEO for the unmanned system. The research also considers
delivers a usable blockchain based model for a collection of Data-Centric Security and blockchain distributed ledger as enabling
researchers’ data, providing accountability of access, maintaining the technologies for Ethical Control. Data-Centric Security is a
complete and updated information, and a verifiable record of the combination of structured messaging, efficient compression, digital
provenance, including all accesses/sharing/usages of the data. Data signature, and document encryption, in correct order, for round-trip
owners will not only enjoy increased transparency and protection of messaging.
data from falling into the wrong hands, but they will also be
incentivized with digital tokens, acknowledgment, or both to share Aebeloe, C., Montoya, G., & Hose, K. (2021, April). ColChain:
their data with the interested data seekers, thus becoming active Collaborative linked data networks.
participants that stand to benefit from the research data economy.
54870 Marcelo Ribeiro de Oliveira Mello et al., Possibilities for structuring data in the wikidata format for data stored in blockchain
Abstract: One of the major obstacles that currently prevents the challenging for diverse reasons, e.g. semantic heterogeneity,
Semantic Web from exploiting its full potential is that the data it provenance, and data quality. As aptly stated by Heath et al. Linked
provides access to is sometimes not available or outdated. The reason Data might be outdated, imprecise, or simply wrong": there arouses a
is rooted deep within its architecture that relies on data providers to necessity to investigate the problem of linked data validity. This work
keep the data available, queryable, and up to date at all times – an reports a collaborative effort performed by nine teams of students,
expectation that many data providers in reality cannot live up to for guided by an equal number of senior researchers, attending the
an extended (or infinite) period of time. Hence, decentralized International Semantic Web Research School (ISWS 2018) towards
architectures have recently been proposed that use replication to keep addressing such investigation from different perspectives coupled
the data available in case the data provider fails. Although this with different approaches to tackle the issue.
increases availability, it does not help keeping the data up to date or
allow users to query and access previous versions of a dataset. In this Positioning bias of studies on blockchain use with Wikidata.
paper, we therefore propose ColChain (Collaborative knowledge
CHAINs), a novel decentralized architecture based on blockchains Study 1: Positive: The article discusses several common principles
that not only lowers the burden for the data providers but at the same for evaluating decentralized data platforms. This approach includes
time also allows users to propose updates to faulty or outdated data, platforms that store, through blockchain networks, metadata based on
trace updates back to their origin, and query older versions of the RDF (Resource Description Framework) with communication via
data. Our extensive experiments show that ColChain reaches these query in the SPARQL Language (SPARQL Protocol and RDF Query
goals while achieving query processing performance comparable to Language). These standards are also used by Wikidata and are
the state of the art. therefore perfectly compatible with it. Although the article does not
literally mention the term "Wikidata", it does consider, with great
Abbas, N., Alghamdi, K., Alinam, M., Alloatti, F., Amaral, G., emphasis, compatibility with information stored in this format.
d'Amato, C., ... & Xu, W. (2020). Knowledge Graphs Evolution and
Preservation Study 2: Positive: Blockchain usage to provide immutability for
public sector documents encoded in RDF. Wikidata Interconnection
Abstract: One of the grand challenges discussed during the Dagstuhl for sophisticated searches and knowledge trees generation in the
Seminar "Knowledge Graphs: New Directions for Knowledge context of government data.
Representation on the Semantic Web" and described in its report is
that of a: Study 3: Positive: As a future trend, these authors' research aims to
use emerging technologies such as blockchain, taking advantage of
"Public FAIR Knowledge Graph of Everything: We increasingly see their characteristics of immutability, transparency and traceability, to
the creation of knowledge graphs that capture information about the seek data accuracy in Wikidata repositories, generated and maintained
entirety of a class of entities. [...] This grand challenge extends this by crowds.
further by asking if we can create a knowledge graph of "everything"
ranging from common sense concepts to location based entities. This Study 4: Neutral: Sharing personal data via blockchain. Although
knowledge graph should be "open to the public" in a FAIR manner this article proposes the sharing of personal data in a blockchain
democratizing this mass amount of knowledge." network, the storage form was not made explicit and Wikidata was
cited as a related solution with a different architecture.
Although linked open data (LOD) is one knowledge graph, it is the
closest realization (and probably the only one) to a public FAIR Study 5: Positive: There is growing popularity in the provenance of
Knowledge Graph (KG) of everything. Surely, LOD provides a data of type LOD, which means: Linked Open Data. The article is
unique testbed for experimenting and evaluating research hypotheses written within the need of integrity and availability guarantee during
on open and FAIR KG. One of the most neglected FAIR issues about the registration of changes, as well as other steps within the
KGs is their ongoing evolution and long term preservation. We want provenance of these data (Data provenance). Thus, they propose "A
to investigate this problem, that is to understand what preserving and system that responds to the needs for monitoring changes and
supporting the evolution of KGs means and how these problems can provenance in the Semantic Web with a guarantee of availability and
be addressed. Clearly, the problem can be approached from different integrity. For this guarantee, the system is inspired by trends in
perspectives and may require the development of different Blockchain technology and by the parallels between decentralized
approaches, including new theories, ontologies, metrics, strategies, applications and the Semantic Web ."
procedures, etc. This document reports a collaborative effort
performed by 9 teams of students, each guided by a senior researcher Study 6: Positive: A decentralized network, using blockchain,
as their mentor, attending the International Semantic Web Research specifically, Hyperledger Fabric, to the provenance of LOD-type data
School (ISWS 2019). Each team provides a different perspective to and RDF standards and modeling. The work recognizes the impacts
the problem of knowledge graph evolution substantiated by a set of that the adoption of LOD represents on the internet, pointing to
research questions as the main subject of their investigation. In Wikidata as a popular publisher.
addition, they provide their working definition for KG preservation
and evolution. Study 7: Neutral: Work involving ethical mission control (military)
in unmanned systems. It proposes a format for storing military
Bucur, C. I., Ciroku, F., Makhalova, T., Rizza, E., Thanapalasingam, mission order data, called MEO - Mission Execution Ontology. This
T., Varanka, D., Wolowyk, M. & Domingue, J. (2019). A described storage should occur ethically, taking value of the
decentralized approach to validating personal data using a advantages arising from the blockchain principles. Despite not
combination of blockchains and linked data. mentioning the term "Wikidata", the proposed work presents wide
compatibility for supporting metadata based on RDF (Resource
Abstract: Linked Open Data (LOD) is the publicly available RDF Description Framework) with communication via queries in the
data in the Web. Each LOD entity is identified by a URI and SPARQL language for querying metadata in the tests of MEO
accessible via HTTP. LOD encodes global scale knowledge concepts.
potentially available to any human as well as artificial intelligence
that may want to benefit from it as background knowledge for Study 8: Positive: Knowledge Graphs (KG) reliably storage through
supporting their tasks. LOD has emerged as the backbone of blockchain networks, having RDF encoding. Allows the user
applications in diverse fields such as Natural Language Processing, community to retrieve and change information similarly to Wikidata
Information Retrieval, Computer Vision, Speech Recognition, and users.
many more. Nevertheless, regardless of the specific tasks that LOD-
based tools aim to address, the reuse of such knowledge may be
54871 International Journal of Development Research, Vol. 12, Issue, 03, pp. 54867-54871, March, 2022
Study 9: Positive: It is a set of studies that involve problems Beris, T., Angelidis, I., Chalkidis, I., Nikolaou, C., Papaloukas, C.,
addressed in dealing with knowledge graphs and their forms of Soursos, P., & Koubarakis, M. (2019, May). Towards a
preservation. Chapter six specifically dealt about using a blockchain decentralized, trusted, intelligent and linked public sector: A
platform for SPARQL-compatible data preservation, which deduces report from the Greek trenches. In Companion Proceedings of
automatic relationship with Wikidata; it's called VAD2ER (Volunteer The 2019 World Wide Web Conference (pp. 840-849).
Anonymous Decentralized Data to Empower Research), an Brutzman, D. P., Blais, C. L., & Wu, H. F. (2020). Ethical Control of
architecture for preserving, evolving, and sharing knowledge graphs Unmanned Systems: Lifesaving/Lethal Scenarios for Naval
of confidential and private information, based on decentralized Operations. NAVAL POSTGRADUATE SCHOOL
technologies such as blockchain and Solid. MONTEREY CA MONTEREY United States.
Bucur, C. I., Ciroku, F., Makhalova, T., Rizza, E., Thanapalasingam,
Study 10: Neutral: Set of articles dealing with relationships with T., Varanka, D., Wolowyk, M. & Domingue, J. (2019). A
LOD and RDF. Chapter nine specifically addresses a personal data decentralized approach to validating personal data using a
validation model that combines LOD with blockchain to do this. combination of blockchains and linked data. International
Although Wikidata is not mentioned, the linked data format is Semantic Web Research Summer School. Linked Open Data
strongly correlated to that database. Validity - A Technical Report from ISWS 2018, 2019, pp. 88 –
97.
CONCLUSION De Angelis, S. (2018). Assessing security and performances of
consensus algorithms for permissioned blockchains. arXiv
To the best of our knowledge, as it can be seen from the table preprint arXiv:1805.03490.
presented: among the 10 studies, none was classified as negative in Domingue, J., Third, A., & Ramachandran, M. (2019, May). The
relation to the bias of the positioning on blockchain metadata storage FAIR TRADE framework for assessing decentralised data
in the format (compatible with) Wikidata; 7 studies presented a solutions. In Companion Proceedings of The 2019 World Wide
favorable bias; and 3 studies with a neutral bias. It is important to Web Conference (pp. 866-882).
note that the issue of bias reflects the current situation of metadata Fernando, S., Barbosa, G., Letouze, P., Brito, G. (2020). Estudo para
provisioning technologies and processes through blockchain Implementação de Metadados nos Documentos Digitais do
networks. As this work found, studies are beginning to emerge, Judiciário Tocantinense. Chapter 2, pp. 41 – 51.
proposing the combination of the immutability guarantee provided by Higgins, J.P.T., Thomas, J., Chandler, J., Cumpston, M., Li, T., Page,
blockchain networks with the storage of LOD metadata based on M.J., Welch, V.A. (editors). Cochrane Handbook for Systematic
RDF. This storage format is intelligible to both humans and Reviews of Interventions version 6.3 (updated February 2022).
machines, intercommunicable on Wikidata. This semantic data Cochrane, 2022. Available from www.training.cochrane.org/
storage format provides a pattern that is apparently being followed as handbook.
a target horizon for cataloging informational objects within the scope Kirstein, F. (2019). A Decentralized Provenance Network for Linked
of human knowledge. This can mean the secure and interchangeable Open Data.
representation of any information objects within a society made up of Leal, F., Veloso, B., Malheiro, B., González-Vélez, H., & Burguillo,
both intelligent men and machines. It is perceived that blockchain J. C. (2020). A 2020 perspective on “scalable modelling and
technologies, in P2P networks, have been used as an object of study recommendation using wiki-based crowdsourced repositories:”
in academia, public and private environments to search for storage of fairness, scalability, and real-time recommendation. Electronic
metadata compatible with Wikidata in a transparent, auditable and Commerce Research and Applications, 40, 100951.
immutable way. Lunardi, R. C., Nunes, H. C., Branco, V. D. S., Lipper, B. H., Neu, C.
V., & Zorzo, A. F. (2019). Performance and cost evaluation of
Acknowledgment: To God, my beloved creator for giving me life smart contracts in collaborative health care environments. arXiv
and capacity to accomplish this job. To my caring, loving and preprint arXiv:1912.09773.
supportive wife, Francielle and my children, Daniel, João Pedro and Monteiro, S. D. (2018). A vida secreta dos metadados no wikidata:
Ester for all incentive and for fulfilling my life with joy. um enfoque sobre o sentido na (web) semântica formal.
Informação & Sociedade: Estudos, 28(1).
Petticrew, M., & Roberts, H. (2008). Systematic reviews in the social
REFERENCES sciences: A practical guide. John Wiley & Sons.
Rojas, R. 2019. RDF management založený na technologii
Abbas, N., Alghamdi, K., Alinam, M., Alloatti, F., Amaral, G., Blockchain (Master's thesis, České vysoké učení technické v
d'Amato, C., ... & Xu, W. 2020. Knowledge Graphs Evolution Praze. Vypočetní a informační centrum.).
and Preservation--A Technical Report from ISWS 2019. arXiv Shrestha, A. K., & Vassileva, J. (2018, June). Blockchain-based
preprint arXiv:2012.11936. research data sharing framework for incentivizing the data
Aebeloe, C., Montoya, G., & Hose, K. (2021, April). ColChain: owners. In International Conference on Blockchain (pp. 259-
Collaborative linked data networks. In Proceedings of the Web 266). Springer, Cham.
Conference 2021 (pp. 1385-1396). Tharani, K. 2021. Much more than a mere technology: A systematic
Alakananda, K. P., Hegde, P., Sangeetha, B. K. & Ramya, M. review of Wikidata in libraries. The Journal of Academic
(2017,11). Block chain technology: Paradigm shift in Business. Librarianship, 47(2), 102326.
International Conference People Connect: Networking For Wikimedia Foundation. (2012) Welcome to the Wikibase project!
Sustainable Development (IJCRT), 1(14), 91 – 98. [Online]. Available: https://wikiba.se/
*******