New Cardinality Notations and Styles For Modeling NoSQL Document-Store Databases
New Cardinality Notations and Styles For Modeling NoSQL Document-Store Databases
net/publication/322216182
CITATIONS READS
9 714
5 authors, including:
Rohiza Ahmad
Universiti Teknologi PETRONAS
80 PUBLICATIONS 274 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by NorShakirah Aziz on 10 January 2019.
Abdullahi Abubakar Imam1,2, Shuib Basri1, Rohiza Ahmad1, Norshakirah Aziz1, María Teresa González-Aparicio3
1
CIS Department, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, 31570, Perak, Malaysia.
2
CS Department, Ahmadu Bello University, Zaria-Nigeria.
3
Computing Department, University of Oviedo Gijon, Spain.
1
[email protected]; {abdullahi_g03618, shuib_basri, rohiza_ahmad, norshakirah.aziz}@utp.edu.my; [email protected]
Abstract—Nowadays, data with several characteristics such as document-store (D-store) databases as they have
volume, variety etc. are generated daily, i.e. big data; its commonalities with SQL databases [8], [9], [10], [11].
complexity cannot be overemphasized. On the other hand,
schema free NoSQL databases keep emerging at almost the same For instance, William (2014), Lead Technical Engineer at
phase to accommodate such data which cannot be efficiently MongoDB states that when modeling relationships such as
managed by relational databases. However, this advancement one-to-many in D-store, notation like 1:M and its concept is
brings about the challenge to model such flexible databases and adopted from SQL, however, it may not always be the case in
capably manage big data despite its complexity. In doing that, NoSQL databases. The M may be further classified into Few,
developers tend to apply their relational modeling skills; Many or Squillions to compliment the beauty of NoSQL
nonetheless, such skills may not be directly compatible with databases. In addition, embedding a child document into the
NoSQL databases due to their schema flexibility, linear parent document doesn’t always signify best practice. At some
scalability among others. To alleviate this difficulty, we propose a point, referencing might be more suitable for better
standard for modeling NoSQL databases, document-stores in performance.
particular. The standard can be classified as i) cardinality
notations, and ii) relationship modeling styles. With such As a result of the aforementioned impediments among
standard, NoSQL document-store databases can be properly others, a research was conducted by [5] to mitigate the
designed, automated database testing can be applied, and modeling issues associated with document-store databases
database performance and stability can be considerably using Formal Concept Analysis, however, in their approach,
improved. To achieve this, experimental method is applied. Also, only existing relational database modeling techniques are
exploratory approach was used to explore the available literature considered which may not always work or need more in-depth
as well as experts consultations. All possible entity relationships classifications as explained in the previous paragraph.
were extracted, aggregated and compiled from a heuristic Consequently, a standardized guide for modeling relationships
evaluation of existing 4 different document-store databases. An in document-store databases is proposed in this paper. This
experiment was conducted to assess the effect of the proposed has become necessary as data increase exponentially in size
standards, results indicate a profound improvements in various and complexity every day; thus progressively complicate
aspect of document modeling when the proposed standards are
NoSQL database modeling and increase chances of erroneous
adopted, especially in a large scaled databases.
designs, which may negatively influence system performance
Keywords— Cardinality Notations; Modeling Styles; NoSQL [11], thereby lead to system crush at worst.
Databases; Big Data; Document-Store; Modeling guidelines. In document-store databases, depending on the nature of
the data, documents are modeled as a collection of related files
I. INTRODUCTION [7][11]. There exist a number of document-store databases
According to ISO, “great things happen when the world which include MongoDB, Apache Cassandra, Couchbase,
agrees” [1]. Thus the time for NoSQL standards is now. CouchDB among others [12]. In this paper, we use MongoDB
NoSQL databases have become so popular for many reasons for implementation of our proposed cardinality standards.
such as their capability to handle data with numerous MongoDB is widely embraced for its flexibility, availability of
characteristics like variety, velocity, volume and variability, i.e. supports and compatibility with many programming languages
big data [2], [3], [4]. However, their heterogeneity, flexibility such as .Net, Java, JavaScript PHP, Python and so on [13]. It
coupled with developers limited NoSQL skills has led to low is remarkable that, ebay uses MongoDB for its online services
quality designs of the NoSQL database structure [3], [5], [6], like session management, shopping carts, preferences and
[7]. Beforehand, programmers are acquainted with skills of product catalog. Also, Facebook, a social media website uses
developing SQL databases for decades where schemas are MongoDB for its major project called Facebook Parse (FP).
enforced by database engine, but with the emergence of With FP, programmers can build, manage and house their
schema free NoSQL databases, experts tend to apply their SQL mobile apps for as long as they wish on FP. This technological
skills in modeling NoSQL databases, especially with support generates tons of data daily from multiple users.
To achieve our goal, four top most document-store Also, foreign keys are not directly supported in document-
databases [13] are selected [4] and individually explored to store databases. In addition, other contributing factors to
identify commonalities as well as disparity points. This leads document-store modeling such as embedding are not
to the extraction of modeling harmonization areas; thereby, considered in this research despite its importance to NoSQL
ground our theories to have basis which can guide the database modeling practice.
proposition of the new standards. Experimental approach was
In a similar concept, some contributors, such as technical
adopted where one software application with one document-
experts from JSON [11] and mongoDB [9] explained some
store database was engineered to rigorously test the proposed
ways to achieve relatively good data modeling relationships,
standards. Cardinality notations and relationship styles
however, the approaches are sort of proprietary, focusing on
proposed in this paper were modeled and implemented. It is at
the functionalities of the database in which they set to
this point evident that, NoSQL databases, especially
promote. There is need to have a generalized approach which
document-store databases, require standard modeling guide for
can be followed by at least one category of NoSQL databases
better database design and appropriate relationship modeling.
[5][8][16].
The key contributions of this paper include:
On the contrary, [17] agrees that, data model relegates bulk
• New cardinality notations for modeling document-store of implementation to NoSQL programmers, therefore,
databases, taking into account embedding and referencing aggregate data modeling style is proposed using Idef1X which
relationship styles. is the standard data modeling language. Again in this study,
relational database notations are used as opposed to the
• New relationship modeling styles. understanding that states, NoSQL databases have bigger and
• Trade-off analysis between modeling styles such as more complicated datasets which require more detailed
embedding, referencing and bucketing; thus help developers aggregate modeling techniques [8]. Whereas, an interactive,
to choose between the styles while modeling their schema-on-read approach was proposed in [2] for finding
document-store databases. multidimensional structures in document stores. Besides, [18]
proposed data migration architecture which will migrate data
• Evaluation of the proposed relationship standards using the from SQL to NoSQL document-store databases while taking
widely used NoSQL document-store databases, MongoDB into account the data models of both the two categories of
[13]. databases.
It is therefore concluded that, as we move towards
II. RELATED WORK standardization in almost every aspect of technology [19]
NoSQL document-store databases are highly flexible [7]. [20][21], NoSQL databases should not be left behind due to
They are based on a flexible model that allows schemas to be the heterogeneity nature of their data model and complexity of
written and managed by the client side application developers the data which they are designed to handle. Standardizations
[5][14]. However, this may lead to incorrect or inappropriate such as relationship modeling and data access should be rather
schema design especially when modeling relationships encouraged to have a common agreement for best practice in
between datasets and entities [5][9]. storing, managing and retrieving data from such powerful
databases.
Document database experts have shared their experiences
on the internet about the most common questions asked by the
client side application developers. Some of these questions are III. PROPOSED STANDARD
(i) how to model one-to-N relationship in document NoSQL databases, specifically document-store databases,
databases? Or (ii) how does one know when to reference provide high scalability, low latency, availability and partition
instead of embedding a document? Or (iii) do document tolerance [6]. Moreover, they support flexible schema where
databases allow Entity Relationship modeling at all? In an databases are modeled freely without following any standard
attempt to address these and alike questions, experts guide [9][5]. As a result, developers tend to apply any
highlighted the necessity to come together and standardize available skills such as relational database modeling skills to
these powerful data stores [5][11][15][9]. This is partly model such flexible databases [11][9]. By doing this, some
because many of the questions keep reappearing repeatedly in key features like speed of document-stores become affected as
multiple knowledge sharing platforms. relational database modeling skills cannot be directly applied
in modeling NoSQL document-store databases and attain
As such, few attempts were made to incorporate relational
maximum benefits of their potentials [11][22][9]. However,
modeling techniques into the NoSQL databases. [5] proposed
some commonalities can be harnessed while eliminating some
conceptual modeling using Formal Concept Analysis (FCA).
individual peculiarities. This is to simplify development
This was proposed to assist developers model document based
hassles and minimize erroneous implementations.
databases. It adopted three (3) types of relationships from
relational databases which are (i) one-to-one 1:1, (ii) one- This research aims at standardizing cardinality notations
to-many 1:M, and (iii) many-to-many M:M and styles which will be used when modeling NoSQL
relationships. These relationships were directly inherited from document-store databases. The fundamental principles of one
relational database and applied onto document-store databases. document with respect to another are a critical aspect of
This method reveals the effectiveness of the aforesaid relationship modeling. Therefore, in this study, standards are
relationships when applied to document-store databases, proposed while taking into cognizant the existing modeling
however, the type of data stored in document-stores are much expertize such as one-to-one, one-to-many etc. Initially, the
more complex and bulkier than the one stored in relational cardinalities are presented followed by relationship styles.
databases; thus require more detailed cardinality breakdowns.
2766
Proc. of the 2017 IEEE Region 10 Conference (TENCON), Malaysia, November 5-8, 2017
2767
Proc. of the 2017 IEEE Region 10 Conference (TENCON), Malaysia, November 5-8, 2017
The following schema is presented to exemplify one-to- as shown in Table II and are briefly explained one after the
squillion relationship. other thereafter.
1 { 1 {
2 _id: ObjectID("12345"), 2 _id: ObjectID("12346"), TABLE II. RELATIONSHIP STYLES
3 activity_id: "Act005", 3 activity_id: "Act006",
4 name: "Cash Deposit", 4 name: "Cash Withdrawal", ID Styles Notations Examples
5 date:ISODate("17-02-04") 5 date:ISODate("2017-02-
6 } 04") 1) Embedding EMB Author ←→ Addresses
1 { 6 }
2) Referencing REF Post ←→ Comments
2 name: "Bank Transaction Management",
3 catalog_id: "9437",
4 activities: [
3) Bucketting BUK System ←→ Logs
5 ObjectID("12345"),
6 ObjectID("12346"),
7
8 ]
.. .. .. //squillions of logs reference ID
Each of the terminology presented in Table II above is
9 } briefly explained as follows, starting from the first in the list
(Embedding):
5) Many-to-Many (N:M): In many-to-many relationship, 1) Embedding (EMB): Embedding can be defined as a
two sided connection between two entities is embraced. Many- process of including a sub document or multiple sub
to-many relationship is achieved by linking the references of documents inside another document. The document that is
the “one” side to the “many” side and the vice versa. embedded is referred to as “child” document, while “parent”
term is used to refer to the document that incorporates other
sub documents. Two types of embedding such as one-way and
Comments N:M Comments two-way embedding are observed. The pattern which
describes both styles is presented in Fig. 6 below and also
Fig. 5 – Many-to-Many Relationship Pattern explained afterwards.
1 1
Embedding
For instance, let us consider assignments-tracking system
where there is a staff-collection with number of staff and
assignments-collection which holds all assignments. Now, any
or multiple staff can be assigned one or more assignments, 1 M
such scenario can be represented as follows where assignment Embedding
reference IDs are added to staff entity.
1 { 1 {
2 _id: ObjectID("AS01"), 2 _id: ObjectID("AS02"),
Fig. 6 – Embedding Style Pattern
3 name: "Design Curriculum", 3 name: "Teach S/W Eng",
4 date:ISODate("217-05-05"), 4 Owners:
5
6
Owners: [
ObjectID("1235"),
ObjectID("1235"),
5 date:ISODate("2017-06-
For example, a department in a university may incorporate
7 ObjectID("1236") 06") other sub documents that hold the details of all the
8
9 }
] 6 }
programs/courses available. Such type of entity attachment
1 { 1 {
may be referred to as one-way embedding style of
2 _id: ObjectID("1235"), 2 _id: ObjectID("1236"), relationship. Whereas in two-way embedding, books and
3 name: "Shuib B", 3 name: "Rohiza Ahmad", authors can be considered where one author appears in many
4 assignments: [ 4 assignments: [
5 ObjectID("AS01"), 5 ObjectID("AS01") books and many books appear in the author’s entity.
6 ObjectID("AS02") 6 ... //… and so on
7 ... //… and so on 7 ] 2) Referencing (REF): Unlike embedding which
8 ] 8 }
9 } includes sub-documents into the parent document, referencing
connects two or more separate documents together using a
Oppositely, to answer some questions like “what are the unique identifier. For instance, when a document-A is said to
assignments handled by more than one person?” staff IDs are reference document-B, the ID of document-A will be present
also added to assignments which indicate that the connection in document-B or/and vice versa, depending on the system
between the two entities is bidirectional. developer. Referencing style of relationship can be described
By referring to the Table I, its denoted that F:F and S:S using the following pattern.
have similar structure with N:M. However, data access
patterns and the nature of applications data may significantly
contribute in choosing the most appropriate model when the
entities on both sides are more than one.
B. Relationship Styles
Data access patterns and the nature of application’s data are
considered the major indicators of whether or not document-
store schema should be modeled together, separate or
bucketed. In this study, these styles of relationship are termed Fig. 7 – Referencing Style Pattern
2768
Proc. of the 2017 IEEE Region 10 Conference (TENCON), Malaysia, November 5-8, 2017
To explain it further, referencing is classified into two, TABLE IV. RESULTS OF ONE-TO-MANY (1:M) RELATIONSHIP
namely child-referencing and parent-referencing. Both the two Relationship Styles
types of referencing are combined and explained using a Number of Embedding Referencing Bucketing
university system, where tasks are assigned to staff and/or vice Documents
W-T:μs R-T:μs W-T:μs R-T:μs W-T:μs R-T:μs
versa.
1:M documents (5000) 1374000 1326000 1131000 1221000 1635000 1573000
3) Bucketting (BUK): Bucketing refers to splitting of
documents into smaller manageable sizes either by quantity, In 1:M relationship, a document with 5000 other related
days, hours etc. It balances between the rigidity of embedding documents were considered. Unlike in the previous
and flexibility of referencing. Bucketing helps in document experiments where embedding dominated other modeling
retrieval and saving. For example, using 1:S relationship, the S styles, this time, results indicate a profound improvement in
side can be bucketed into smaller quantity such as 5000s to the referencing technique as it leaves embedding and
improve document retrieval speed and portability in data bucketing behind who scored 1374000W, 1326000R,
presentations as the case of pagination. The following section 1635000W and 1573000R respectively. The reason is that,
explains the experimental set-up, tools and their specifications. when such numbers (5000) of documents are embedded, the
document becomes larger and larger which must be accessed
C. Experiments each time read/write data is needed.
A real-world example was implemented in order to look
TABLE V. RESULTS OF ONE-TO-SQUILLION (1:S) RELATIONSHIP
into how the proposed cardinality notations and relationship
styles affect document-store relationship modeling. The Relationship Styles
experiment was conducted with the following Number of
Embedding Referencing Bucketing
hardware/software: an intel dual processor, core i7-3632QM; Documents
W-T:μs R-T:μs W-T:μs R-T:μs W-T:μs R-T:μs
CPU running at 2.20GHz * 2; and 8GB of RAM. 64 bit of
1:S docs (500000) 3601000 3579000 1871000 1931000 2412000 2161000
windows 10 was used as the operating system; Dev-C++ as
IDE (Integrated Development Environment); C++ as the In 1:S relationship, the difference between the relationship
programming language; and NoSQL document-store database styles become clear where embedding looks not to be a viable
(mongoDB) as the database management system. Results of option for modeling 1:S. whereas, bucketing shows a slight
these experiments are presented in the next section. improvement from the previous experiment, this indicates that
as data size increase relevance of bucketing become more
IV. RESULTS AND DISCUSSIONS pronounce. However, referencing style has shown its powers
when modeling 1:S relationship. This is as a result of
In this section, the results generated from the experiment
decentralization method embraced by referencing style since
are presented and discussed. The results are classified based
the increment of data does not affect the main document. So, it
on the cardinality notations presented in Table I and the styles
can be concluded that, referencing style is the choice for 1:S.
discussed in Section III B. Each of the cardinalities is assessed
to see which of the relationship style suits it most. It is TABLE VI. RESULTS OF FEW-TO-FEW, MANY-TO-MANY AND
observed that some cardinalities have a very similar SQUILLION-TO-SQUILLION RELATIONSHIP
relationship pattern. For example, F:F and N:M presented in
Table I signify two sided many-to-many relationship with Relationship Styles
Number of
different N or M sizes. As such, in the interest of Documents Embedding Referencing Bucketing
generalization, we distinctly experiment the cardinalities and W-T:μs R-T:μs W-T:μs R-T:μs W-T:μs R-T:μs
put forward the results as follows. Each read/write operation is N:M docs (100000) 2710000 2504000 1700000 1831000 1950000 1773000
measured by time, microseconds (μs) in particular.
On the other hand, in N:M relationship, bucketing and
referencing styles perform very well, mostly when retrieving
TABLE III. RESULTS OF ONE-TO-ONE & ONE-TO-FEW (1:1 & 1:F) documents. Whereas, embedding seems not go with such type
RELATIONSHIPS
of data size. This is because many of the documents are large
Relationship Styles in size, and bucketing partitioned them for faster retrieval,
Number of Embedding Referencing Bucketing however, the partitioning did not work well when writing data.
Documents With all this competition, again, referencing style has shown
W-T:μs R-T:μs W-T:μs R-T:μs W-T:μs R-T:μs
better performance for both read and write events. It is
1:1 documents 742000 734000 813000 941000 - - therefore concluded that, referencing is a better option for
1:F documents (7) 795000 771000 894000 976000 1371000 1271000
N;M relationship, then followed by bucketing.
2769
Proc. of the 2017 IEEE Region 10 Conference (TENCON), Malaysia, November 5-8, 2017
this paper proposed new cardinality notations and relationship Proc. - 2011 6th Int. Conf. Pervasive Comp. Appl., pp. 363–366, 2011.
styles for modeling NoSQL document-store databases. To [8] P. Atzeni, “Data Modelling in the NoSQL world : A contradiction ?,” no.
achieve this feat, experimental approach (exploratory and
June, pp. 23–24, 2016.
confirmatory) was applied in this research. This involves
exploration of the available literature, heuristic evaluation of [9] Z. William, “6 Rules of Thumb for MongoDB Schema Design,”
existing document-store databases as well as consultations of MongoDB, 2014. [Online]. Available:
the document-store experts. Rigorous experiment was https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-
conducted to assess the proposed ideas. schema-design-part-1. [Accessed: 23-Jan-2017].
Results indicate a significant improvement in the general [10] R. April, “NoSQL Technologies: Embrace NoSQL as a relational Guy –
performance of the NoSQL document-store databases when Column Family Store,” DBCouncil, 2016. [Online]. Available:
the standards are adopted, specifically when read/write events https://dbcouncil.net/category/nosql-technologies/. [Acc: 21-Apr-2017].
are performed. In addition, it is concluded that, the proposed [11] R. CrawCuor and D. Makogon, Modeling Data in Document Databases.
cardinalities and modeling styles concepts can, without doubt, United States: Developer Experience & Document DB, 2016.
ease development process, minimize erroneous schema
implementation and improve system performance, especially [12] N. Jatana, S. Puri, and M. Ahuja, “A Survey and Comparison of
in a large scale applications. Our future focus would be to Relational and Non-Relational Database,” Int. J,vl.1, no.6, pp.1–5, 2012.
propose an easier way to model NoSQL databases. [13] M. Gelbmann, “DB-Engines Ranking of Document Stores,” DB-
Undoubtedly, modeling NoSQL databases will continue to be Engines, 2017. [Online]. Available: https://db-
the focus of future research. engines.com/en/ranking/document+store. [Accessed: 21-Feb-2017].
[14] T. A. Alhaj, M. M. Taha, and F. M. Alim, “Synchronization Wireless
ACKNOWLEDGEMENT Algorithm Based on Message Digest ( SWAMD ) For Mobile Device
The authors wish to acknowledge the support from Database,” 2013 Int. Conf. Comput. Electr. Electron. Eng.
Universiti Teknologi PETRONAS (UTP) for funding this Synchronization, pp. 259–262, 2013.
research through Yayasan and Graduate Assistantship Scheme [15] G. Matthias, “Knowledge Base of Relational and NoSQL Database
(UTP-GA).
Management Systems: DB-Engines Ranking per database model
category,” DB-Engines, 2017. [Online]. Available: https://db-
REFERENCES engines.com/en/ranking_categories. [Accessed: 21-Apr-2017].
[1] ISO, International Organization for Standardization Strategy 2016 - [16] M. J. Mior, “Automated schema design for NoSQL databases,” Proc.
2020. Switzerland: International Organization for Standardization, 2016. 2014 SIGMOD PhD Symp. - SIGMOD’14 PhD Symp., pp. 41–45, 2014.
[2] M. L. Chouder, S. Rizzi, and R. Chalal, “Enabling Self-Service BI on [17] V. Jovanovic and S. Benson, “Aggregate Data Modeling Style,” SAIS
Document Stores,” Work. Proceed- c ings Jt. Conf. Venice, Italy, 2017. 2013 Proc., pp. 70–75, 2013.
[3] P. Atzeni, F. Bugiotti, and L. Rossi, “Uniform access to NoSQL [18] M. Mughees, “DATA MIGRATION FROM STANDARD SQL TO
systems,” Inf. Syst., vol. 43, pp. 117–133, 2014. NoSQL,” 2013.
[4] J. G. Enríquez, F. J. Domínguez-Mayo, M. J. Escalona, M. Ross, and G. [19] J. Häkkilä, “Developing Design Guidelines for Context-Aware Mobile
Staples, “Entity Reconciliation in Big Data Sources: a Systematic Applications,” Proc. 3rd Int. Conf. Mob. Technol. Appl. Syst. ACM,
Mapping Study,” Expert Syst. Appl., vol. 80, pp. 14–27, 2017. 2006, pp. 1–7, 2006.
[5] V. Varga, K. T. Jánosi, and B. Kálmán, “Conceptual Design of [20] J. Rodriguez, “Guidelines for designing usable world wide web pages,”
Document NoSQL Database with Formal Concept Analysis,” Acta Conf. Companion Hum. Factors Comput. Syst. ACM., pp277–278, 1996.
Polytech. Hungarica, vol. 13, no. 2, pp. 229–248, 2016. [21] J. Gong and P. Tarasewich, I. Science, “Guidelines for Handheld Mobile
[6] M. T. Gonzalez-Aparicio, M. Younas, J. Tuya, and R. Casado, “A New Device Interface Design,” Proc. DSI Annu. Meet., pp. 3751–3756, 2004.
Model for Testing CRUD Operations in a NoSQL Database,” in 2016 [22] W. Naheman, “Review ofNoSQL Databases and Performance Testing
IEEE 30th International Conference on Advanced Information on HBase,” Int. Conf. Mechat. Sci. E. Eng. Comp., pp.2304–2309, 2013.
Networking and Applications (AINA), 2016, vol. 6, pp. 79–86.
[7] J. Han, E. Haihong, G. Le, and J. Du, “Survey on NoSQL database,”