0% found this document useful (0 votes)
1 views

Performance_Comparison_of_NOSQL_Database_Cassandra

Dokumen ini membahas perbandingan kinerja database NoSQL, khususnya Cassandra, dalam berbagai skenario penggunaan. Analisis mencakup aspek kecepatan baca/tulis, skalabilitas, replikasi data, serta keandalannya dalam menangani big data. Studi ini juga membandingkan Cassandra dengan database NoSQL lainnya untuk memberikan gambaran yang lebih komprehensif mengenai keunggulan dan keterbatasannya. Cocok untuk akademisi, profesional IT, dan siapa saja yang tertarik memahami performa database NoSQL dal

Uploaded by

Raka de Robbin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Performance_Comparison_of_NOSQL_Database_Cassandra

Dokumen ini membahas perbandingan kinerja database NoSQL, khususnya Cassandra, dalam berbagai skenario penggunaan. Analisis mencakup aspek kecepatan baca/tulis, skalabilitas, replikasi data, serta keandalannya dalam menangani big data. Studi ini juga membandingkan Cassandra dengan database NoSQL lainnya untuk memberikan gambaran yang lebih komprehensif mengenai keunggulan dan keterbatasannya. Cocok untuk akademisi, profesional IT, dan siapa saja yang tertarik memahami performa database NoSQL dal

Uploaded by

Raka de Robbin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Performance Comparison of NOSQL Database

Cassandra and SQL Server for Large Databases


1
Khalid Mahmood
1
Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology, Karachi Pakistan
1
[email protected]

Abstract--The performance comparison of NoSQL storage available at very low prices. Currently there are three
database and a Relational Database Management Systems new types of databases that are being used in the industry for
has been done to identify which database responds faster processing these data types. Not only SQL NoSQL a generic
to specific types of requests and suitability of these name for all the databases that are not relational and for infor-
databases for different scenarios. Cassandra was taken as mational retrieval, it is not must to use query language.
sample NoSQL database and its performance was NoSQL databases allow storage and retrieval of data which
compared with front line relational database SQL Server was not entered in the relational databases. NoSQL databases,
2012. being schema-free, support easy replication, attain eventual
consistent status and are capable of handling huge amounts of
The new category of databases, the NoSQL databases database. The main reason of having a NoSQL database is
are horizontally scalable as such these databases are very aimed at having simple design, horizontal scalability and a
compatible for use at data centers that require very large very fine control on the availability of the database.
size databases with variety of data types. Performance of
SQL Server2012 and Cassandra was compared in a limit- C) Main Distinguishing Features of NoSQL Databases
ed scenario but it was quite clear that for kind of database
required for business, the relational databases are the The main distinguishing feature of NoSQL databases is
choice. NoSQL technology is improving at a fast pace and having a different data structures when compared to RDMS.
different types of databases are coming into the market. These structures result in making certain NoSQL operations
New schema free environments and flexible table designs faster compared to relational databases. These databases
offer a lot to look forward. The four different types of supports only simple queries compared to relational databases
NoSQL databases are providing specialized utilization for where can be very complex making multiple joins. NoSQL
specific technology areas. databases do not have a fixed schema unlike the relational
databases; rather the schema can be modified at run time, as
Keywords---NoSQL, RDBMS, Distributed Databases, and when required. These databases have been designed for
Schema Free, Performance Comparison working on clusters of servers that may or may not be at one
location or one data center. The distributed nature of databas-
I. INTRODUCTION es restricts achieving the status of; “eventually consistent”.
A) Evolution of Relational Database Management Systems There are many different types of NoSQL database. Their
suitability depends on its usage to solve specific problems.
Relational databases are in for the last 30 years when EF Different NoSQL data bases have special usage in different
Codd came up with the relational model. The database domains. Cassandra being one of the most popular and has
systems progressed over the years and evolved into formida- been the mainstay of Facebook for very long time. In fact
ble systems having no competitor. Big names in this domain Cassandra was developed by Facebook.
are IBM’s DB2, Oracle Sybase and SQL Server by Microsoft.
Relational Databases excelled in providing strong support for D) Column Family Stores
query languages, followed four distinguishing features of
Atomicity, Consistency, Isolation and Durability (ACID). These databases are meant for storing very large size
Mainstay of the business databases. data, particularly when the data is distributed across many
servers, may be located at different locations. There can be
B) Emergence of NoSQL Database multiple keys pointing to many columns. These columns are
combined into tables called column families. Cassandra and
There are numerous different types of database that HBase are the popular Column Family Stores.
emerged as a result of the advent of social media and large

Journal of Independent Studies and Research – Computing Volume 14 Issue 2 July-Dec 2016 21
E) Motivation it on the Grid or an implementation on the cloud. These next
generation databases are generally distributed, open source
Comparative studies exist for same class of databases, and very much scalable horizontally. NOSQL databases are
like all NoSQL databases [1]. There is a study on comparative non-relational, support easy replication and are mostly
performance evaluation of MySQL and MongoDB at Univer- schema-free, no-join, and support easy replication. A good
sity of Edinburgh [2]. On the other hand, various studies have understanding of the design of non-relational database,
compared Relational Database Management Systems [3]. comparison between relational and NOSQL architecture has
Comparison of Cassandra vs. Microsoft SQL Server identified important research directions in this important area
compares the System Properties but not the comparative [5].
performance in read and write operations [4].
B) A Distributed Storage System - Cassandra
Performance comparison between Cassandra and SQL
Server will provide opportunity to decide when a user should Cassandra has been developed to be used as decentral-
switch from a Relational DBMS to a NoSQL database and ized, distributed storage for very large databases that are
vice versa. The focus of this study is to measure time taken on spread over numerous commodity servers, providing reliable
write and read operations in both the databases on single node service catering for failure of one or more nodes. Cassandra
operations for large database. can run on machines that may be spread across multiple
locations with likelihood of multiple failures. Cassandra can
F) Experimental Framework manage the persistent state with these failures providing
scalability and reliability of different systems that are using
For comparison of the databases three different opera- this service. Though Cassandra is similar to relational
tions would be used on both the databases. Tables have been database to certain extent, it does not have features of
created to contain 1,000,000 records considered enough to RDBMS but provides dynamic control over how data has
check performance on a single node. The operations planned been laid out. Facebook the largest social networking
are: platform, serves millions of users uses tens of thousands of
servers which are housed in data centers across the globe [6].
1. Read. This reads the data against a key from the
key-value pair storage in Cassandra and a key in SQL
C) Performance comparison of NoSQL databases and SQL
server. This corresponds to Select operation used in
Express
Relational databases that corresponds to Read in
‘Create, Read, Update, and Delete’ (CRUD).
Most of NoSQL databases store data as key-value pairs
on the premise. High speed Internet and cheap storage has
2. Write. Data saved in other formats can be written into
encouraged capturing all kind of semi-structured, and
Cassandra, if related data is not available in the
unstructured data from variety of applications in the organiza-
database, it is updated. Write operation combines the
tion. A new term Big Data has emerged that includes all kinds
Create and Update operations as in relational databases.
of data coming from multiple sources. Processing of Big Data
requires speed, flexible schemas, and distributed databases.
3. Select. Cassandra supports select statement and SQL
Comparison was focused on read, write, and delete operations
like statement using CQL can be used to select the
on the multiple key-value databases. It was found that there
desired data. Simple select statement have a constraint
were wide variation of performance within NoSQL databases.
of selecting not more than 15,000 records at a time.
It was also observed that there was practically no correlation
in the data model used and the corresponding performance.
II. LETERATURE REVIEW
A) NoSQL: Non-Relational Databases of Next-Generation Comparison has been primarily made between NoSQL
databases that include Couch base, MongoDB, Cassandra,
Distributed databases are now the standard for storage of Hypertable, Couch DB and Raven DB and a relational
data for the Web2 applications being used by all front line database SQL Express. It was found that not all NoSQL
social media organizations like Google, Facebook, LinkedIn, databases perform better than the SQL Express. Within
Twitter and Yahoo!. All these are processing very large NoSQL databases, a wide deviation was found depending
databases of the scale petabytes. Although RDBMS do upon the type of operation. [7]
provide simplicity, robustness and performance, they have
limitation of flexibility to scale with database application, be

22 Journal of Independent Studies and Research – Computing Volume 14 Issue 2 July-Dec 2016
D) RDBMS to NoSQL, a Continuous Evolution completed before continuing. This was done because Cassan-
dra and even SQL Server performance degraded when some
Relational databases are not proving suitable for new application was already running.
generation web based applications supporting millions of
C) Cassandra Setup
users and data distributed across multiple servers. The new
technologies named NoSQL database have now developed The Cassandra version 3.4 was installed from DataStax.
enough and offer very cost effective solutions for mobile and It received the IP from the personal computer used for the test
web applications. The new NoSQL databases support appli- purpose. For Cassandra testing, Keyspace and column family
cations with large transaction volumes but need or can were created by running the commands via the cqlsh
perform with low latency. To meet the complexity of command line utility. For creation of Keyspace following
database for web applications, companies started building command was used.
their own databases for their special workload. These
in-house developments are the main inspiration behind the CREATE KEYSPACE testdat WITH REPLICATION =
current NoSQL databases. The authors have correctly identi- {'class': 'NetworkTopologyStrategy', 'dc1' : 3 };
fied the scenarios when the organization should move out
from relational databases to NoSQL databases. The main Column family .salesorderheader1m was created with same
focus would be type of application that has been written, the attributes of SQL Server table.
kind of queries that the users expect and any variations that
may be expected in database design [8]. IV. COMPARATIVE ANALYSIS
For performance analyses, following three tests were
III. METHODOLOGY FOR PERFORMANCE
carried out.
COMPARISONEVIEW
a. Comparison of Read Performance of csv into
A) Configuration database
b. Comparison of Write Performance of database
The tests were run on a machine with I3 processor. SQL
table into csv format
Server 2012 was used for performance comparison. For
c. Comparison of SELECT Performance for limited
Cassandra, DataStax installation of Cassandra 3.4 was used.
number of records
PC configurations was as under:
System Type: x64-based PC A) Import Data Analysis
Processor: Intel(R) Core(TM) i3-2310M Importing of database into SQL Server and Cassandra,
CPU @ 2.10GHz, 2 Core(s), the performance of SQL server was far superior. While
4 Logical Processor(s) Cassandra took on average over 321 seconds, SQL Server
Total Physical Memory: 3.94 GB took on average about 34.42 seconds. These results show that
OS: Microsoft Windows 10 SQL Server has better throughput when exporting data to
Professional another data type as shown in figure 1.

B) Relational database Management Systems: SQL Server 2012

For testing the performance, AdventureWorks2012


sample database in SQL Server 2012 was used. A special
table salesorderheader1m was created using SQL Server
SalesOrderHeader table by running SQL script and a special
table was built for running the tests. The table had a row count
of 1,000,000 with 26 columns of different data types and Data
Size of 226.844 MB.

This table was specially created to carry out tests on a


single processor machine for ease of handling. The test
started with an empty database created in Cassandra with
same attributes but assigned Cassandra data types. The data
from SQL Server was exported to CSV file for subsequent
loading onto Cassandra. Tests for Cassandra and SQL Server
checked for any ongoing processes, and waited until those Fig. (1). Import of Data by SQL Server and Cassandra

Journal of Independent Studies and Research – Computing Volume 14 Issue 2 July-Dec 2016 23
B) Export Data Analysis V. CONCLUSION AND FUTURE WORKS

In exporting data, the performance of SQL Server was A) Conclusion


again very superior with respect to Cassandra. Whereas, This study was an aimed at investigating and comparing
Cassandra took on average of 322.6 Seconds to export performance and scaling of a NoSQL database and a
1,000,000 records, SQL Server took just 20.3 Seconds as Relational Database Management Systems. The performance
shown in figure 2. of the both the databases was explored to a limited extent, to
find which database responds faster to specific types of
requests and suitability of these databases for different
scenarios. Which technology is more suitable than the other
and under what circumstances? The relational databases
developed in 80s with specific structures in mind and built to
have tables with columns and rows and pre-defined schema.
The most important aspect, the database schema gives a
logical view of the database and relations between tables, thus
allowing creating databases which are very quick to respond
and easy to design, with guaranteed reliability and technically
no duplication. The new category of databases, the NoSQL
databases, are relatively new and have become popular as
they provide horizontal scalability which has made these
databases very suitable for data centers that require very
large.

Fig. (2). Export of Data by SQL Server and Cassandra The study tested, compared and analyzed the perfor-
mance of the two databases SQL Server2012 and Cassandra.
C) Select / Read Records The experiments that were done on the two databases on just
a PC with 4 GB Ram and an I3 processor were constrained.
Time was measured for select command for all the attrib-
Complex queries could not be run on the machine. Remarka-
utes of the table. While SQL Server was comfortable in
ble performance difference was in the import and export of
selecting and displaying all the records at a time, for Cassan-
data. The data that took over three minutes using Cassandra
dra, had to specify a limit which had a maximum value of
was exported in just about 20 seconds. Same was the case of
15,000. Thus three measurements were taken to select 5,000,
import of csv file with one million records that took about
10,000 and 15,000 records for both the databases. Again SQL
similar time with the two databases, for selecting data through
Server performance was better than Cassandra. Cassandra in
an SQL commands on SQL Server 2012 and CQL on Cassan-
cql shell had a page limit of 100 records which could not be
dra had remarkable performance difference. The queries that
increased. For this Devcenter for Cassandra had to be
fetched 5000 records, SQL Server performed twice as fast.
installed. This installation was rather tricky as it required 64
When the number of records fetched increased to 10,000, the
bit JVM environment. The tests for Cassandra were run and
time had increased by three times. Increasing the data extract-
the comparative results are as shown below in figure 3:
ed to 15,000 records, the time increased by 7 time. This aspect
could not be further checked as Cassandra gave error for
selecting over 15,000 records.

For comprehensive performance, a good option would be


acquire resources from cloud where NoSQL database could
enjoy the distributed environment and could demonstrate
enhanced processing capability. Machines with larger RAM
and multiprocessing environment are more conducive for
NoSQL databases whereas Relational Databases would
perform much better on a single server with extended
processing capability. NoSQL technology is evolving and
improving every day with new schema free environments and
very flexible database table designs. The four different types
of NoSQL databases are providing specialized utilization for
Fig. (3). Select /Read Records by SQL Server and Cassandra

24 Journal of Independent Studies and Research – Computing Volume 14 Issue 2 July-Dec 2016
specific technology areas. Whereas the Key-value stores are are not benefiting from the new databases. There is need to
somehow the simple type of database management systems, develop database systems with IDEs like the relational
they store pairs of keys and values. The data can be retrieved databases and a standardized language interface for conven-
only when the key to the record is known. Not suitable for ient handling.
very complex database designs, their simple architecture
makes these systems suitable in specific applications. Their REFERENCES
main applications are in embedded systems. They are also
used in in-process databases where high performance is the [1] Benchmarking Top NoSQL Databases Apache Cassan-
key. The Column Family databases are good for storing very dra, Couch base, HBase, and MongoDB, [Online].
large size databases, especially for distributed environment Available: http://www.endpoint.com/.
when the data is distributed over many servers. Multiple keys [2] C. Hadjigeorgiou, "RDBMS vs NoSQL: Performance
pointing to multiple columns may be generally arranged into and Scaling Comparison." M.S. dissertation, The
column families. Cassandra is one very popular Column University of Edinburgh, 2013.
Family Store. [3] Y. Bassil, "A comparative study on the performance of
the Top DBMS systems," Journal of Computer Science
Another NoSQL databases, the Document-Oriented & Research, vol. 1, no. 1, pp: 20-31, 2012.
databases facilitate storage, retrieval and managing [4] System Properties Comparison Cassandra vs. Microsoft
semi-structured data. These are a kind of key-value stores.
SQL Server, [Online]. Available: http://db-en-
The difference is how they process the data; a key-value store
gines.com/en/system/Cassandra%3BMicrosoft+SQL+-
considers the data to be somewhat transparent to the database,
but a document-oriented system may use the internal struc- Server
ture of the document to retrieve metadata used by database [5] R. P. Padhy, M. R. Patra, and S. C. Satapathy, "RDBMS
engine. Couch DB, Mongo DB are popular Document to NoSQL: Reviewing some next-generation non-rela-
Databases. The fourth type of NoSQL database, the Graph tional databases." International Journal of Advanced
database uses graph data model that is flexible and can be Engineering Science and Technologies, vol. 11, no. 1,
very comfortably scaled across multiple servers. Again, pp: 15-30, 2011.
Graph Databases do not offer any advanced query processing [6] Lakshman and P. Malik. "Cassandra: a decentralized
like SQL and thus avoid overtime in handling joins. To run structured storage system." ACM SIGOPS Operating
queries on such databases is specific to the data model. Systems Review, vol. 44, no. 2, pp: 35-40, 2010.
Neo4J, Infinite Graph and InfoGrid are popular Graph Doi: 10.1145/1773912.1773922
Databases.
[7] Y. Li and S. Manoharan. "A performance comparison of
SQL and NoSQL databases." In Proceedings of IEEE
B) Future Work
Pacific Rim Conference on Communications, Comput-
For a clear line of thinking, there is need for further ers and Signal Processing (PACRIM), 2013.
investigation into defining the linkage between the type of
[8] C. Nance, T. Losser, R. lype and G. Harmon, "Nosql vs
database and its applications in the industry. The four
rdbms - why there is room for both." In Proceedings of
databases discussed above are offering options for different
applications but the processes are so very complex that new the Southern Association for Information Systems
users are reluctant to adopt the new technologies and as such Conference, 2013.

© Author(s) 2016. CC Attribution 4.0 License. (http://creativecommons.org/licenses/by-nc/4.0/)

This article is licensed under the terms of the Creative Commons Attribution Non-Commercial License which permits unrestricted, non-commercial use, distribu-
tion and reproduction in any medium, provided the work is properly cited.

Journal of Independent Studies and Research – Computing Volume 14 Issue 2 July-Dec 2016 25

You might also like